Target keyword: AI crawler robots.txt examples

AI crawler robots.txt examples you can copy safely

Use these examples as policy starting points for OpenAI, Google, Apple, Perplexity, and Common Crawl crawlers. The safest pattern is to separate search visibility crawlers from model-training or broad data-use controls instead of blocking every AI-related token at once.

Quick rule: if AI-search visibility matters, do not accidentally block search crawlers such as Googlebot, OAI-SearchBot, Applebot, or PerplexityBot. Decide separately on training-use controls such as GPTBot, Google-Extended, Applebot-Extended, and broad crawlers such as CCBot.

Example 1: Search visible, no model training

Best for marketing sites, SaaS docs, and ecommerce stores that want eligibility for search and answer surfaces, but do not want broad model-training use.

User-agent: Googlebot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: Applebot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Example 2: Maximum discovery

Best for public open-source docs, public research pages, and sites whose business model benefits from maximum discoverability and citation opportunities.

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Example 3: Private docs or customer portal

Best when public marketing pages should remain discoverable, but account, app, admin, billing, or internal docs paths should stay out of automated crawling.

User-agent: Googlebot
Allow: /
Disallow: /app/
Disallow: /admin/
Disallow: /account/
Disallow: /billing/
Disallow: /internal-docs/

User-agent: OAI-SearchBot
Allow: /
Disallow: /app/
Disallow: /admin/
Disallow: /account/
Disallow: /billing/
Disallow: /internal-docs/

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Allow: /
Disallow: /app/
Disallow: /admin/
Disallow: /account/
Disallow: /billing/
Disallow: /internal-docs/

Sitemap: https://example.com/sitemap.xml

Example 4: Publisher with snippet controls

Robots.txt controls crawling. Snippet and answer-display behavior may need page-level meta robots or platform-specific controls. Use this when articles can be indexed but selected sections should not be summarized.

User-agent: Googlebot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: Applebot
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Then add page-level controls only where needed:

<meta name="robots" content="max-snippet:160">
<meta name="applebot" content="nosnippet">

Example 5: Staging or test site

Best for preview domains, staging environments, and test builds that should not be indexed. Do not use this on production unless you intentionally want to disappear from search.

User-agent: *
Disallow: /

How to verify before publishing

Run the checker

Paste your draft and confirm the important crawler tokens are allowed, blocked, or restricted exactly as intended.

Open the robots.txt checker

Compare user agents

Use the source-backed list when you need to explain why search crawlers and training-use controls should be handled separately.

Open the user-agent list

Source notes