Free AI crawler user-agent lookup

AI crawler user-agent lookup for robots.txt, logs, and WAF rules.

Search common AI and search crawler tokens, understand what each crawler is for, copy safe robots.txt starter rules, and cite official proof links before changing crawler policy.

Search crawler tokens Open JSON pack Open well-known JSON
Fast answer

A user-agent string is a clue, not identity proof.

Use this lookup to separate search crawlers from training-use controls and user-triggered fetchers. Then verify important bot traffic with published IP ranges, reverse DNS, or provider guidance before making traffic, security, or WAF claims.

Search crawler user agents

Try GPTBot, OAI-SearchBot, Google-Extended, Applebot, PerplexityBot, CCBot, training, or user triggered.

Agent handoff
AI crawler user-agent lookup ready.
Use /data/ai-crawler-user-agent-lookup-pack.json for machine-readable crawler records.
Use /.well-known/ai-crawler-user-agent-lookup-pack.json as the stable discovery path.

Crawler token table

Token Category robots.txt? Recommended first decision Proof
OAI-SearchBot
OpenAI
search discovery Yes Allow when ChatGPT search visibility matters; decide separately from GPTBot. Official source
GPTBot
OpenAI
training use crawler Yes Decide from training-use policy; do not block OAI-SearchBot just because GPTBot is blocked. Official source
ChatGPT-User
OpenAI
user triggered fetch Usually no / limited Monitor separately in logs; use OAI-SearchBot for Search opt-outs and automatic crawl policy. Official source
OAI-AdsBot
OpenAI
ads landing page validation Yes Only relevant if submitting ads on ChatGPT; do not confuse with organic search crawling. Official source
Googlebot
Google
search discovery Yes Allow for public pages that should be eligible for Google Search. Official source
Google-Extended
Google
ai use control token Yes Set separately from Googlebot; do not expect a separate Google-Extended HTTP user-agent in logs. Official source
Applebot
Apple
search discovery Yes Allow public pages if Apple ecosystem discovery matters. Official source
Applebot-Extended
Apple
ai use control token Yes Use when you want Applebot discovery but need a separate Apple training-use decision. Official source
PerplexityBot
Perplexity
search answer discovery Yes Allow if Perplexity search/answer visibility matters, and whitelist published IP ranges if a WAF blocks it. Official source
Perplexity-User
Perplexity
user triggered fetch Usually no / limited Monitor separately from PerplexityBot; verify IP ranges for WAF allow rules. Official source
CCBot
Common Crawl
open web dataset crawler Yes Allow if open web dataset participation is acceptable; block if broad dataset reuse is outside policy. Official source

Detailed crawler notes

OpenAI / search discovery

OAI-SearchBot

Automatic search crawler for ChatGPT search surfaces.

Default: Allow when ChatGPT search visibility matters; decide separately from GPTBot.

Verify: Match the OAI-SearchBot token, then verify against the published OpenAI searchbot IP JSON before using it as identity proof.

Copy-ready note
User-agent: OAI-SearchBot
Allow: /

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot

Read official source
OpenAI / training use crawler

GPTBot

Crawler for content that may be used to improve OpenAI generative AI foundation models.

Default: Decide from training-use policy; do not block OAI-SearchBot just because GPTBot is blocked.

Verify: Match GPTBot, then verify against OpenAI's published GPTBot IP JSON where identity matters.

Copy-ready note
User-agent: GPTBot
Disallow: /

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3; +https://openai.com/gptbot

Read official source
OpenAI / user triggered fetch

ChatGPT-User

User-requested fetcher for certain ChatGPT and Custom GPT actions.

Default: Monitor separately in logs; use OAI-SearchBot for Search opt-outs and automatic crawl policy.

Verify: Treat as user-triggered evidence; verify IPs if using it for bot identity claims.

Copy-ready note
User-agent: OAI-SearchBot
Allow: /
# ChatGPT-User is user-triggered; monitor logs separately.

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot

Read official source
OpenAI / ads landing page validation

OAI-AdsBot

OpenAI ads landing-page safety and relevance validation.

Default: Only relevant if submitting ads on ChatGPT; do not confuse with organic search crawling.

Verify: Verify against the published OAI-AdsBot IP JSON when ad review traffic matters.

Copy-ready note
User-agent: OAI-AdsBot
Allow: /

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-AdsBot/1.0; +https://openai.com/adsbot

Read official source
Google / search discovery

Googlebot

Google Search crawler.

Default: Allow for public pages that should be eligible for Google Search.

Verify: Use reverse DNS or Google's documented verification flow before trusting a Googlebot user-agent string.

Copy-ready note
User-agent: Googlebot
Allow: /

Use the Googlebot token or documented Googlebot UA patterns; do not pin to one Chrome version.

Read official source
Google / ai use control token

Google-Extended

Robots.txt product token for certain Gemini model training and grounding controls.

Default: Set separately from Googlebot; do not expect a separate Google-Extended HTTP user-agent in logs.

Verify: Look for Googlebot or other Google crawler strings in logs; Google-Extended itself is a robots.txt control token.

Copy-ready note
User-agent: Google-Extended
Disallow: /

No separate HTTP request user-agent string; crawling uses existing Google user-agent strings.

Read official source
Apple / search discovery

Applebot

Apple web crawler for Spotlight, Siri, Safari, and related Apple ecosystem search experiences.

Default: Allow public pages if Apple ecosystem discovery matters.

Verify: Verify reverse DNS under applebot.apple.com or match the published Applebot CIDR JSON.

Copy-ready note
User-agent: Applebot
Allow: /

Applebot appears inside the user-agent string; Apple documents a general Safari/WebKit format.

Read official source
Apple / ai use control token

Applebot-Extended

Secondary robots.txt control for Apple foundation-model training usage.

Default: Use when you want Applebot discovery but need a separate Apple training-use decision.

Verify: Do not expect crawl hits from Applebot-Extended; verify Applebot itself for crawler identity.

Copy-ready note
User-agent: Applebot-Extended
Disallow: /

Apple says Applebot-Extended does not crawl webpages.

Read official source
Perplexity / search answer discovery

PerplexityBot

Perplexity search result crawler.

Default: Allow if Perplexity search/answer visibility matters, and whitelist published IP ranges if a WAF blocks it.

Verify: Match PerplexityBot and verify against Perplexity's published IP range JSON.

Copy-ready note
User-agent: PerplexityBot
Allow: /

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)

Read official source
Perplexity / user triggered fetch

Perplexity-User

User-requested fetcher for Perplexity user actions.

Default: Monitor separately from PerplexityBot; verify IP ranges for WAF allow rules.

Verify: Treat as user-triggered and verify against Perplexity's published Perplexity-User IP JSON.

Copy-ready note
# Perplexity-User is user-triggered. Review WAF rules and logs separately.

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)

Read official source
Common Crawl / open web dataset crawler

CCBot

Crawler for Common Crawl's open web crawl dataset.

Default: Allow if open web dataset participation is acceptable; block if broad dataset reuse is outside policy.

Verify: Verify with Common Crawl reverse DNS or the published ccbot.json IP ranges.

Copy-ready note
User-agent: CCBot
Disallow: /

CCBot/2.0 (https://commoncrawl.org/faq/)

Read official source

Pre-AI search DB handoff

Agents should query the compact pack or answer endpoints first, then cite at least one official source and one LLMs.txt Kit proof link.