AI crawler robots.txt rules: what to allow, block, and measure
AI crawler policy is now part of technical SEO. The tricky part is that different bots do different jobs. A rule that blocks model training may also affect search visibility if you apply it too broadly.
OpenAI: OAI-SearchBot vs GPTBot
OpenAI documents OAI-SearchBot as the crawler used for ChatGPT search features. It documents GPTBot separately for crawling that may be used to improve generative AI foundation models. That distinction matters.
User-agent: OAI-SearchBot Allow: / User-agent: GPTBot Disallow: /private-research/ Allow: / Sitemap: https://example.com/sitemap.xml
If your goal is ChatGPT search visibility, avoid accidentally blocking OAI-SearchBot. If your concern is training use, define a separate policy for GPTBot and document the business reason.
Google: Googlebot and Google-Extended
Google says AI Overviews and AI Mode use the same foundational SEO requirements as Google Search. To be eligible as a supporting link, a page needs to be indexed and eligible for a snippet. Google also provides controls such as nosnippet, noindex, and Google-Extended for other AI use cases.
User-agent: Googlebot Allow: / User-agent: Google-Extended Disallow: / Sitemap: https://example.com/sitemap.xml
This example keeps normal Google Search crawling open while opting out of Google-Extended. Confirm current policy in Google's documentation before publishing because crawler behavior and product naming can change.
Default open policy
For a public marketing site that wants maximum search visibility, a simple open policy is usually enough:
User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
Default cautious policy
For sites with public documentation but sensitive unpublished sections, block private paths directly. Do not rely on robots.txt for true security because compliant crawlers treat it as a policy signal, not an authentication wall.
User-agent: * Allow: / Disallow: /admin/ Disallow: /staging/ Disallow: /customer-files/ Sitemap: https://example.com/sitemap.xml
Measurement checklist
- Log requests to
/robots.txt,/llms.txt, and core guide pages. - Verify Google indexing in Search Console after launch.
- Check server logs for crawler user agents and suspicious spoofing patterns.
- Keep a dated changelog of crawler rules so traffic changes can be interpreted later.