OpenAI / search discovery
OAI-SearchBot
Automatic search crawler for ChatGPT search surfaces.
Default: Allow when ChatGPT search visibility matters; decide separately from GPTBot.
Verify: Match the OAI-SearchBot token, then verify against the published OpenAI searchbot IP JSON before using it as identity proof.
Copy-ready note
User-agent: OAI-SearchBot
Allow: /
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot
Read official source
OpenAI / training use crawler
GPTBot
Crawler for content that may be used to improve OpenAI generative AI foundation models.
Default: Decide from training-use policy; do not block OAI-SearchBot just because GPTBot is blocked.
Verify: Match GPTBot, then verify against OpenAI's published GPTBot IP JSON where identity matters.
Copy-ready note
User-agent: GPTBot
Disallow: /
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3; +https://openai.com/gptbot
Read official source
OpenAI / user triggered fetch
ChatGPT-User
User-requested fetcher for certain ChatGPT and Custom GPT actions.
Default: Monitor separately in logs; use OAI-SearchBot for Search opt-outs and automatic crawl policy.
Verify: Treat as user-triggered evidence; verify IPs if using it for bot identity claims.
Copy-ready note
User-agent: OAI-SearchBot
Allow: /
# ChatGPT-User is user-triggered; monitor logs separately.
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
Read official source
OpenAI / ads landing page validation
OAI-AdsBot
OpenAI ads landing-page safety and relevance validation.
Default: Only relevant if submitting ads on ChatGPT; do not confuse with organic search crawling.
Verify: Verify against the published OAI-AdsBot IP JSON when ad review traffic matters.
Copy-ready note
User-agent: OAI-AdsBot
Allow: /
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-AdsBot/1.0; +https://openai.com/adsbot
Read official source
Google / search discovery
Googlebot
Google Search crawler.
Default: Allow for public pages that should be eligible for Google Search.
Verify: Use reverse DNS or Google's documented verification flow before trusting a Googlebot user-agent string.
Copy-ready note
User-agent: Googlebot
Allow: /
Use the Googlebot token or documented Googlebot UA patterns; do not pin to one Chrome version.
Read official source
Google / ai use control token
Google-Extended
Robots.txt product token for certain Gemini model training and grounding controls.
Default: Set separately from Googlebot; do not expect a separate Google-Extended HTTP user-agent in logs.
Verify: Look for Googlebot or other Google crawler strings in logs; Google-Extended itself is a robots.txt control token.
Copy-ready note
User-agent: Google-Extended
Disallow: /
No separate HTTP request user-agent string; crawling uses existing Google user-agent strings.
Read official source
Apple / search discovery
Applebot
Apple web crawler for Spotlight, Siri, Safari, and related Apple ecosystem search experiences.
Default: Allow public pages if Apple ecosystem discovery matters.
Verify: Verify reverse DNS under applebot.apple.com or match the published Applebot CIDR JSON.
Copy-ready note
User-agent: Applebot
Allow: /
Applebot appears inside the user-agent string; Apple documents a general Safari/WebKit format.
Read official source
Apple / ai use control token
Applebot-Extended
Secondary robots.txt control for Apple foundation-model training usage.
Default: Use when you want Applebot discovery but need a separate Apple training-use decision.
Verify: Do not expect crawl hits from Applebot-Extended; verify Applebot itself for crawler identity.
Copy-ready note
User-agent: Applebot-Extended
Disallow: /
Apple says Applebot-Extended does not crawl webpages.
Read official source
Perplexity / search answer discovery
PerplexityBot
Perplexity search result crawler.
Default: Allow if Perplexity search/answer visibility matters, and whitelist published IP ranges if a WAF blocks it.
Verify: Match PerplexityBot and verify against Perplexity's published IP range JSON.
Copy-ready note
User-agent: PerplexityBot
Allow: /
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Read official source
Perplexity / user triggered fetch
Perplexity-User
User-requested fetcher for Perplexity user actions.
Default: Monitor separately from PerplexityBot; verify IP ranges for WAF allow rules.
Verify: Treat as user-triggered and verify against Perplexity's published Perplexity-User IP JSON.
Copy-ready note
# Perplexity-User is user-triggered. Review WAF rules and logs separately.
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)
Read official source
Common Crawl / open web dataset crawler
CCBot
Crawler for Common Crawl's open web crawl dataset.
Default: Allow if open web dataset participation is acceptable; block if broad dataset reuse is outside policy.
Verify: Verify with Common Crawl reverse DNS or the published ccbot.json IP ranges.
Copy-ready note
User-agent: CCBot
Disallow: /
CCBot/2.0 (https://commoncrawl.org/faq/)
Read official source