Target keyword: AI crawler robots.txt

AI crawler robots.txt rules: what to allow, block, and measure

AI crawler policy is now part of technical SEO. The tricky part is that different bots do different jobs. A rule that blocks model training may also affect search visibility if you apply it too broadly.

OpenAI: OAI-SearchBot vs GPTBot

OpenAI documents OAI-SearchBot as the crawler used for ChatGPT search features. It documents GPTBot separately for crawling that may be used to improve generative AI foundation models. That distinction matters.

User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Disallow: /private-research/
Allow: /

Sitemap: https://example.com/sitemap.xml

If your goal is ChatGPT search visibility, avoid accidentally blocking OAI-SearchBot. If your concern is training use, define a separate policy for GPTBot and document the business reason.

Google: Googlebot and Google-Extended

Google says AI Overviews and AI Mode use the same foundational SEO requirements as Google Search. To be eligible as a supporting link, a page needs to be indexed and eligible for a snippet. Google also provides controls such as nosnippet, noindex, and Google-Extended for other AI use cases.

User-agent: Googlebot
Allow: /

User-agent: Google-Extended
Disallow: /

Sitemap: https://example.com/sitemap.xml

This example keeps normal Google Search crawling open while opting out of Google-Extended. Confirm current policy in Google's documentation before publishing because crawler behavior and product naming can change.

Default open policy

For a public marketing site that wants maximum search visibility, a simple open policy is usually enough:

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Default cautious policy

For sites with public documentation but sensitive unpublished sections, block private paths directly. Do not rely on robots.txt for true security because compliant crawlers treat it as a policy signal, not an authentication wall.

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /staging/
Disallow: /customer-files/

Sitemap: https://example.com/sitemap.xml

Measurement checklist

Log requests to /robots.txt, /llms.txt, and core guide pages.
Verify Google indexing in Search Console after launch.
Check server logs for crawler user agents and suspicious spoofing patterns.
Keep a dated changelog of crawler rules so traffic changes can be interpreted later.

Practical rule: decide crawler policy by use case. Search visibility, model training, paid access, and private content are separate decisions.