Target keyword: AI crawler user agents list

AI crawler user agents list for robots.txt, logs, and AI search visibility

This page lists the AI and search crawler user-agent tokens most small teams ask about first. Use it to separate search visibility, training-use controls, user-requested fetches, and broad web dataset crawlers.

Verification note: user-agent strings can be spoofed. Treat a user-agent match as a first clue, then verify with official IP ranges, reverse DNS, or provider guidance where available.

Download the full dataset

This list is a human-readable landing page for the benchmark dataset. Use the files below for audits, spreadsheets, and support replies.

Benchmark page

Read the complete crawler policy matrix and copy-ready robots.txt examples.

Open benchmark

JSON

Machine-readable crawler categories, strategy notes, verification methods, and source URLs.

Download JSON

CSV

Spreadsheet-ready crawler token table for lightweight policy reviews.

Download CSV

User-agent token table

Operator	User-agent token	Category	Robots.txt?	Practical policy note
OpenAI	`OAI-SearchBot`	Search/discovery	Yes	Allow if ChatGPT search eligibility matters.
OpenAI	`GPTBot`	Training-use crawler	Yes	Decide separately from OAI-SearchBot; block only if model-training use is not allowed.
OpenAI	`ChatGPT-User`	User-requested fetch	No / limited	Monitor separately in logs; do not treat it like automatic crawling.
Google	`Googlebot`	Search/discovery	Yes	Allow for Google Search visibility unless a page should not be indexed.
Google	`Google-Extended`	AI-use control	Yes	Use when allowing Google Search while opting out of certain AI uses.
Apple	`Applebot`	Search/discovery	Yes	Allow for Apple ecosystem discovery if that visibility matters.
Apple	`Applebot-Extended`	Training-use control	Yes	Use to opt out of Apple foundation-model training while keeping Applebot discovery.
Perplexity	`PerplexityBot`	Search/answer discovery	Yes	Allow if Perplexity answer visibility matters; verify with provider guidance for WAF rules.
Perplexity	`Perplexity-User`	User-requested fetch	No / limited	Monitor separately from PerplexityBot because the use case is user-triggered.
Common Crawl	`CCBot`	Open web dataset	Yes	Allow for open data participation; block if broad dataset reuse is outside policy.

Copy-ready robots.txt starter

This starter keeps search/discovery open while blocking broad training or dataset reuse. Adjust for your own content policy.

User-agent: *
Allow: /

User-agent: Googlebot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: Applebot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: CCBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

AI crawler user agents list for robots.txt, logs, and AI search visibility

Download the full dataset

Benchmark page

JSON

CSV

User-agent token table

Copy-ready robots.txt starter

Official sources