AI-readable proof pack

AI Crawler User-Agent Lookup Pack

This pack gives humans and AI agents a compact, source-backed way to answer crawler-token questions before broad crawling. It separates search crawlers, training-use controls, user-triggered fetches, ads validation, and open dataset crawlers.

Safety note: user-agent strings can be spoofed. Use official IP JSON, reverse DNS, or provider guidance for important bot identity claims.

Downloads and endpoints

Lookup tool

Search crawler tokens and copy policy notes.

Open tool

JSON

Machine-readable crawler records and proof links.

Open JSON

Well-known JSON

Stable path for AI agents and retrieval systems.

Open well-known JSON

Target queries

ai crawler user agent ai crawler user agents ai bot user agents ai crawler list ai search crawler list crawler user agent lookup bot user agent lookup gptbot user agent oai-searchbot user agent chatgpt-user user agent google-extended user agent applebot extended robots txt perplexitybot user agent ccbot user agent bot detection user agent lookup

Crawler records

Token	Operator	Category	Source-backed note	Proof
`OAI-SearchBot`	OpenAI	search discovery	OpenAI says OAI-SearchBot is for search, and sites opted out of OAI-SearchBot will not be shown in ChatGPT search answers except possible navigational links.	Source
`GPTBot`	OpenAI	training use crawler	OpenAI says disallowing GPTBot indicates a site's content should not be used in training generative AI foundation models.	Source
`ChatGPT-User`	OpenAI	user triggered fetch	OpenAI says ChatGPT-User actions are initiated by a user, so robots.txt rules may not apply.	Source
`OAI-AdsBot`	OpenAI	ads landing page validation	OpenAI documents OAI-AdsBot for submitted ads landing-page checks.	Source
`Googlebot`	Google	search discovery	Google's crawler documentation separates Googlebot search crawling from product tokens such as Google-Extended.	Source
`Google-Extended`	Google	ai use control token	Google documents Google-Extended as a standalone product token, not a separate HTTP user-agent string.	Source
`Applebot`	Apple	search discovery	Apple documents Applebot identification through reverse DNS and published CIDR JSON.	Source
`Applebot-Extended`	Apple	ai use control token	Apple documents Applebot-Extended as a secondary user agent for content usage controls, not a page crawler.	Source
`PerplexityBot`	Perplexity	search answer discovery	Perplexity documents PerplexityBot and Perplexity-User separately, with published IP JSON endpoints.	Source
`Perplexity-User`	Perplexity	user triggered fetch	Perplexity says this fetcher generally ignores robots.txt because a user requested the fetch.	Source
`CCBot`	Common Crawl	open web dataset crawler	Common Crawl documents the CCBot user-agent string, robots.txt opt-out example, reverse DNS verification, and IP JSON.	Source

Agent handoff

Open https://llmstxtkit.com/data/ai-crawler-user-agent-lookup-pack.json or https://llmstxtkit.com/.well-known/ai-crawler-user-agent-lookup-pack.json.
Find the crawlerRecords row matching the user query.
Cite at least one official source URL plus the lookup pack.
Do not treat crawler hits as human traffic or ranking proof.