Run the checker
Paste your draft and confirm the important crawler tokens are allowed, blocked, or restricted exactly as intended.
Open the robots.txt checkerUse these examples as policy starting points for OpenAI, Google, Apple, Perplexity, and Common Crawl crawlers. The safest pattern is to separate search visibility crawlers from model-training or broad data-use controls instead of blocking every AI-related token at once.
Googlebot, OAI-SearchBot, Applebot, or PerplexityBot. Decide separately on training-use controls such as GPTBot, Google-Extended, Applebot-Extended, and broad crawlers such as CCBot.
Best for marketing sites, SaaS docs, and ecommerce stores that want eligibility for search and answer surfaces, but do not want broad model-training use.
User-agent: Googlebot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: Applebot Allow: / User-agent: PerplexityBot Allow: / User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
Best for public open-source docs, public research pages, and sites whose business model benefits from maximum discoverability and citation opportunities.
User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
Best when public marketing pages should remain discoverable, but account, app, admin, billing, or internal docs paths should stay out of automated crawling.
User-agent: Googlebot Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /account/ Disallow: /billing/ Disallow: /internal-docs/ User-agent: OAI-SearchBot Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /account/ Disallow: /billing/ Disallow: /internal-docs/ User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: * Allow: / Disallow: /app/ Disallow: /admin/ Disallow: /account/ Disallow: /billing/ Disallow: /internal-docs/ Sitemap: https://example.com/sitemap.xml
Robots.txt controls crawling. Snippet and answer-display behavior may need page-level meta robots or platform-specific controls. Use this when articles can be indexed but selected sections should not be summarized.
User-agent: Googlebot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: Applebot Allow: / User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
Then add page-level controls only where needed:
<meta name="robots" content="max-snippet:160"> <meta name="applebot" content="nosnippet">
Best for preview domains, staging environments, and test builds that should not be indexed. Do not use this on production unless you intentionally want to disappear from search.
User-agent: * Disallow: /
Paste your draft and confirm the important crawler tokens are allowed, blocked, or restricted exactly as intended.
Open the robots.txt checkerUse the source-backed list when you need to explain why search crawlers and training-use controls should be handled separately.
Open the user-agent list