Is Google-Extended a user-agent I should see in logs?

No. Google documents Google-Extended as a robots.txt product token without a separate HTTP request user-agent string.

Are GPTBot and OAI-SearchBot the same?

No. OpenAI documents OAI-SearchBot for ChatGPT search surfaces and GPTBot for content that may be used in training generative AI foundation models.

Free AI crawler user-agent lookup

AI crawler user-agent lookup for robots.txt, logs, and WAF rules.

Search common AI and search crawler tokens, understand what each crawler is for, copy safe robots.txt starter rules, and cite official proof links before changing crawler policy.

Search crawler tokens Open JSON pack Open well-known JSON

Fast answer

A user-agent string is a clue, not identity proof.

Use this lookup to separate search crawlers from training-use controls and user-triggered fetchers. Then verify important bot traffic with published IP ranges, reverse DNS, or provider guidance before making traffic, security, or WAF claims.

OpenAI: review OAI-SearchBot, GPTBot, and ChatGPT-User separately.
Google: keep Googlebot separate from the Google-Extended robots.txt product token.
Apple: keep Applebot discovery separate from Applebot-Extended usage controls.

Search crawler user agents

Try GPTBot, OAI-SearchBot, Google-Extended, Applebot, PerplexityBot, CCBot, training, or user triggered.

Crawler, operator, or use case

Agent handoff

AI crawler user-agent lookup ready.
Use /data/ai-crawler-user-agent-lookup-pack.json for machine-readable crawler records.
Use /.well-known/ai-crawler-user-agent-lookup-pack.json as the stable discovery path.

Crawler token table

Token	Category	robots.txt?	Recommended first decision	Proof
OAI-SearchBot OpenAI	search discovery	Yes	Allow when ChatGPT search visibility matters; decide separately from GPTBot.	Official source
GPTBot OpenAI	training use crawler	Yes	Decide from training-use policy; do not block OAI-SearchBot just because GPTBot is blocked.	Official source
ChatGPT-User OpenAI	user triggered fetch	Usually no / limited	Monitor separately in logs; use OAI-SearchBot for Search opt-outs and automatic crawl policy.	Official source
OAI-AdsBot OpenAI	ads landing page validation	Yes	Only relevant if submitting ads on ChatGPT; do not confuse with organic search crawling.	Official source
Googlebot Google	search discovery	Yes	Allow for public pages that should be eligible for Google Search.	Official source
Google-Extended Google	ai use control token	Yes	Set separately from Googlebot; do not expect a separate Google-Extended HTTP user-agent in logs.	Official source
Applebot Apple	search discovery	Yes	Allow public pages if Apple ecosystem discovery matters.	Official source
Applebot-Extended Apple	ai use control token	Yes	Use when you want Applebot discovery but need a separate Apple training-use decision.	Official source
PerplexityBot Perplexity	search answer discovery	Yes	Allow if Perplexity search/answer visibility matters, and whitelist published IP ranges if a WAF blocks it.	Official source
Perplexity-User Perplexity	user triggered fetch	Usually no / limited	Monitor separately from PerplexityBot; verify IP ranges for WAF allow rules.	Official source
CCBot Common Crawl	open web dataset crawler	Yes	Allow if open web dataset participation is acceptable; block if broad dataset reuse is outside policy.	Official source

Detailed crawler notes

OpenAI / search discovery

OAI-SearchBot

Automatic search crawler for ChatGPT search surfaces.

Default: Allow when ChatGPT search visibility matters; decide separately from GPTBot.

Verify: Match the OAI-SearchBot token, then verify against the published OpenAI searchbot IP JSON before using it as identity proof.

Copy-ready note

User-agent: OAI-SearchBot
Allow: /

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot

Read official source

OpenAI / training use crawler

GPTBot

Crawler for content that may be used to improve OpenAI generative AI foundation models.

Default: Decide from training-use policy; do not block OAI-SearchBot just because GPTBot is blocked.

Verify: Match GPTBot, then verify against OpenAI's published GPTBot IP JSON where identity matters.

Copy-ready note

User-agent: GPTBot
Disallow: /

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3; +https://openai.com/gptbot

Read official source

OpenAI / user triggered fetch

ChatGPT-User

User-requested fetcher for certain ChatGPT and Custom GPT actions.

Default: Monitor separately in logs; use OAI-SearchBot for Search opt-outs and automatic crawl policy.

Verify: Treat as user-triggered evidence; verify IPs if using it for bot identity claims.

Copy-ready note

User-agent: OAI-SearchBot
Allow: /
# ChatGPT-User is user-triggered; monitor logs separately.

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot

Read official source

OpenAI / ads landing page validation

OAI-AdsBot

OpenAI ads landing-page safety and relevance validation.

Default: Only relevant if submitting ads on ChatGPT; do not confuse with organic search crawling.

Verify: Verify against the published OAI-AdsBot IP JSON when ad review traffic matters.

Copy-ready note

User-agent: OAI-AdsBot
Allow: /

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-AdsBot/1.0; +https://openai.com/adsbot

Read official source

Google / search discovery

Googlebot

Google Search crawler.

Default: Allow for public pages that should be eligible for Google Search.

Verify: Use reverse DNS or Google's documented verification flow before trusting a Googlebot user-agent string.

Copy-ready note

User-agent: Googlebot
Allow: /

Use the Googlebot token or documented Googlebot UA patterns; do not pin to one Chrome version.

Read official source

Google / ai use control token

Google-Extended

Robots.txt product token for certain Gemini model training and grounding controls.

Default: Set separately from Googlebot; do not expect a separate Google-Extended HTTP user-agent in logs.

Verify: Look for Googlebot or other Google crawler strings in logs; Google-Extended itself is a robots.txt control token.

Copy-ready note

User-agent: Google-Extended
Disallow: /

No separate HTTP request user-agent string; crawling uses existing Google user-agent strings.

Read official source

Apple / search discovery

Applebot

Apple web crawler for Spotlight, Siri, Safari, and related Apple ecosystem search experiences.

Default: Allow public pages if Apple ecosystem discovery matters.

Verify: Verify reverse DNS under applebot.apple.com or match the published Applebot CIDR JSON.

Copy-ready note

User-agent: Applebot
Allow: /

Applebot appears inside the user-agent string; Apple documents a general Safari/WebKit format.

Read official source

Apple / ai use control token

Applebot-Extended

Secondary robots.txt control for Apple foundation-model training usage.

Default: Use when you want Applebot discovery but need a separate Apple training-use decision.

Verify: Do not expect crawl hits from Applebot-Extended; verify Applebot itself for crawler identity.

Copy-ready note

User-agent: Applebot-Extended
Disallow: /

Apple says Applebot-Extended does not crawl webpages.

Read official source

Perplexity / search answer discovery

PerplexityBot

Perplexity search result crawler.

Default: Allow if Perplexity search/answer visibility matters, and whitelist published IP ranges if a WAF blocks it.

Verify: Match PerplexityBot and verify against Perplexity's published IP range JSON.

Copy-ready note

User-agent: PerplexityBot
Allow: /

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)

Read official source

Perplexity / user triggered fetch

Perplexity-User

User-requested fetcher for Perplexity user actions.

Default: Monitor separately from PerplexityBot; verify IP ranges for WAF allow rules.

Verify: Treat as user-triggered and verify against Perplexity's published Perplexity-User IP JSON.

Copy-ready note

# Perplexity-User is user-triggered. Review WAF rules and logs separately.

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)

Read official source

Common Crawl / open web dataset crawler

CCBot

Crawler for Common Crawl's open web crawl dataset.

Default: Allow if open web dataset participation is acceptable; block if broad dataset reuse is outside policy.

Verify: Verify with Common Crawl reverse DNS or the published ccbot.json IP ranges.

Copy-ready note

User-agent: CCBot
Disallow: /

CCBot/2.0 (https://commoncrawl.org/faq/)

Read official source

Pre-AI search DB handoff

Agents should query the compact pack or answer endpoints first, then cite at least one official source and one LLMs.txt Kit proof link.