Googlebot
Google recommends reverse DNS, forward DNS, or matching published Google crawler IP ranges to verify requests that claim to be from Google.
Google verification guideSearch and AI crawler visibility should be proven with logs, not vibes. A small site can start with a simple access-log review: which bots arrived, which URLs they requested, and whether they received useful HTTP responses.
If you have a small nginx, Apache, Cloudflare, or JSONL access-log sample, paste it into the free analyzer and copy a crawler proof report.
| Level | Evidence | How much to trust it | What to do next |
|---|---|---|---|
| Weak | User-agent string appears in a log line. | Useful clue, but spoofable. | Check status codes, paths, IP ranges, and repeat behavior. |
| Medium | Recognized crawler gets HTTP 200 on /robots.txt, /sitemap.xml, /llms.txt, and important pages. |
Good crawlability proof for a small site. | Compare before and after crawler-policy changes. |
| Strong | Logs plus official IP range or reverse-DNS verification, Search Console data, referral evidence, and tool activation events. | Best practical proof that a traffic funnel is working. | Summarize it in a dated launch report and keep monitoring weekly. |
/robots.txt, /sitemap.xml, and /llms.txt.Metric Why it matters /robots.txt hits Crawlers checked access policy /sitemap.xml hits Crawlers discovered canonical URLs /llms.txt hits AI-aware agents or humans checked context 200 status on core pages Public pages are reachable 403/404 on core pages Crawl or deployment problem UTM campaign visits Real distribution proof Generator/copy events Activation proof
Use these locally on your own access logs. Redact sensitive values before sharing outputs.
# Find crawler-looking requests in an nginx or Apache combined log.
rg -i "Googlebot|OAI-SearchBot|GPTBot|Applebot|PerplexityBot|CCBot" access.log
# Count hits to discovery files.
rg "GET /(robots.txt|sitemap.xml|llms.txt)" access.log
# Look for crawler errors on important URLs.
rg -i "Googlebot|OAI-SearchBot|GPTBot|PerplexityBot|CCBot" access.log | rg " 4[0-9]{2} | 5[0-9]{2} "
# Summarize crawler hits by token when logs are small enough for shell review.
rg -io "Googlebot|OAI-SearchBot|GPTBot|Applebot|PerplexityBot|CCBot" access.log | sort | uniq -c | sort -nr
Google recommends reverse DNS, forward DNS, or matching published Google crawler IP ranges to verify requests that claim to be from Google.
Google verification guideOpenAI publishes separate IP range files for OAI-SearchBot, GPTBot, and ChatGPT-User. Treat OAI-SearchBot and GPTBot as different proof categories.
OpenAI crawler docsPerplexity recommends combining user-agent and IP range checks when configuring WAF allow rules, then monitoring logs after changes.
Perplexity crawler docsGoogle says robots.txt manages crawler access, but it is not a way to keep a page private or fully out of Google if other pages link to it.
Google robots.txt guideLLMs.txt Kit writes server-side access events and funnel events on the VPS preview. The public proof file summarizes traffic sources, crawler classifications, campaign visits, and activation events. That makes the funnel auditable before the final domain is live.
Current proof files:
/robots.txt, /sitemap.xml, /llms.txt, and core pages return HTTP 200.