Benchmark page
Read the complete crawler policy matrix and copy-ready robots.txt examples.
Open benchmarkThis page lists the AI and search crawler user-agent tokens most small teams ask about first. Use it to separate search visibility, training-use controls, user-requested fetches, and broad web dataset crawlers.
This list is a human-readable landing page for the benchmark dataset. Use the files below for audits, spreadsheets, and support replies.
Read the complete crawler policy matrix and copy-ready robots.txt examples.
Open benchmarkMachine-readable crawler categories, strategy notes, verification methods, and source URLs.
Download JSONSpreadsheet-ready crawler token table for lightweight policy reviews.
Download CSV| Operator | User-agent token | Category | Robots.txt? | Practical policy note |
|---|---|---|---|---|
| OpenAI | OAI-SearchBot |
Search/discovery | Yes | Allow if ChatGPT search eligibility matters. |
| OpenAI | GPTBot |
Training-use crawler | Yes | Decide separately from OAI-SearchBot; block only if model-training use is not allowed. |
| OpenAI | ChatGPT-User |
User-requested fetch | No / limited | Monitor separately in logs; do not treat it like automatic crawling. |
Googlebot |
Search/discovery | Yes | Allow for Google Search visibility unless a page should not be indexed. | |
Google-Extended |
AI-use control | Yes | Use when allowing Google Search while opting out of certain AI uses. | |
| Apple | Applebot |
Search/discovery | Yes | Allow for Apple ecosystem discovery if that visibility matters. |
| Apple | Applebot-Extended |
Training-use control | Yes | Use to opt out of Apple foundation-model training while keeping Applebot discovery. |
| Perplexity | PerplexityBot |
Search/answer discovery | Yes | Allow if Perplexity answer visibility matters; verify with provider guidance for WAF rules. |
| Perplexity | Perplexity-User |
User-requested fetch | No / limited | Monitor separately from PerplexityBot because the use case is user-triggered. |
| Common Crawl | CCBot |
Open web dataset | Yes | Allow for open data participation; block if broad dataset reuse is outside policy. |
This starter keeps search/discovery open while blocking broad training or dataset reuse. Adjust for your own content policy.
User-agent: * Allow: / User-agent: Googlebot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: Applebot Allow: / User-agent: PerplexityBot Allow: / User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: CCBot Disallow: / Sitemap: https://example.com/sitemap.xml