Target keyword: AI crawler user agents list

AI crawler user agents list for robots.txt, logs, and AI search visibility

This page lists the AI and search crawler user-agent tokens most small teams ask about first. Use it to separate search visibility, training-use controls, user-requested fetches, and broad web dataset crawlers.

Verification note: user-agent strings can be spoofed. Treat a user-agent match as a first clue, then verify with official IP ranges, reverse DNS, or provider guidance where available.

Download the full dataset

This list is a human-readable landing page for the benchmark dataset. Use the files below for audits, spreadsheets, and support replies.

Benchmark page

Read the complete crawler policy matrix and copy-ready robots.txt examples.

Open benchmark

JSON

Machine-readable crawler categories, strategy notes, verification methods, and source URLs.

Download JSON

CSV

Spreadsheet-ready crawler token table for lightweight policy reviews.

Download CSV

User-agent token table

Operator User-agent token Category Robots.txt? Practical policy note
OpenAI OAI-SearchBot Search/discovery Yes Allow if ChatGPT search eligibility matters.
OpenAI GPTBot Training-use crawler Yes Decide separately from OAI-SearchBot; block only if model-training use is not allowed.
OpenAI ChatGPT-User User-requested fetch No / limited Monitor separately in logs; do not treat it like automatic crawling.
Google Googlebot Search/discovery Yes Allow for Google Search visibility unless a page should not be indexed.
Google Google-Extended AI-use control Yes Use when allowing Google Search while opting out of certain AI uses.
Apple Applebot Search/discovery Yes Allow for Apple ecosystem discovery if that visibility matters.
Apple Applebot-Extended Training-use control Yes Use to opt out of Apple foundation-model training while keeping Applebot discovery.
Perplexity PerplexityBot Search/answer discovery Yes Allow if Perplexity answer visibility matters; verify with provider guidance for WAF rules.
Perplexity Perplexity-User User-requested fetch No / limited Monitor separately from PerplexityBot because the use case is user-triggered.
Common Crawl CCBot Open web dataset Yes Allow for open data participation; block if broad dataset reuse is outside policy.

Copy-ready robots.txt starter

This starter keeps search/discovery open while blocking broad training or dataset reuse. Adjust for your own content policy.

User-agent: *
Allow: /

User-agent: Googlebot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: Applebot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: CCBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

Official sources