Google robots.txt safety evidence matrix

Keep Googlebot, Google-Extended, path rules, proof links, and traffic measurement separate.

Use this matrix before publishing robots.txt changes. It turns the observed zero-click query robots txt google generator into a proof-linked workflow instead of unsafe crawler advice.

Fast answer

For Google Search, keep Googlebot crawlable; handle Google-Extended separately.

robots.txt is not access control. Treat it as a public crawl preference file, not a privacy layer. Use noindex, password protection, or removals for content that must stay out of Search.

Evidence rows

robots_txt_not_access_control

robots.txt is not access control

Question: Can I hide private pages from Google with robots.txt?

No. robots.txt is a public crawler preference file. It can manage crawling, but private or sensitive pages need authentication, password protection, noindex, or removal controls.

Next action: Use robots.txt only for crawl preferences and safe-to-public path patterns; keep private content behind real access control.

Not proof: A Disallow line is not proof that a page is private, deindexed, or inaccessible.

keep_googlebot_search_open

Keep Googlebot open for Search

Question: What is the main risk in a Google robots.txt generator?

The main risk is blocking Googlebot or User-agent: * by accident while still expecting Google Search traffic. Make the intended Googlebot outcome explicit before publishing.

Next action: Use the Googlebot-safe preset, test priority public paths, then run the live checker after upload.

Not proof: Allowing Googlebot is not a ranking guarantee or click guarantee.

google_extended_separate_policy

Keep Google-Extended separate

Question: Can I block Google-Extended without blocking Googlebot?

Yes. Treat Google-Extended as a separate product token and policy decision from Googlebot. Do not present Google-Extended blocking as a Google Search ranking improvement.

Next action: Write Googlebot and Google-Extended rules separately, then test both paths.

Not proof: A Google-Extended rule is not a Search ranking signal or proof of Googlebot crawling.

googlebot_mobile_desktop_same_token

Googlebot Smartphone and Desktop use the same token

Question: Can robots.txt target Googlebot Smartphone and Desktop differently?

Google says Googlebot Smartphone and Googlebot Desktop both obey the same Googlebot product token in robots.txt, so a robots.txt generator should avoid pretending to split them with separate product-token rules.

Next action: Use one Googlebot rule set for Search crawling and validate mobile/desktop behavior in Search Console or logs separately.

Not proof: A robots.txt draft cannot prove mobile-first indexing outcomes.

path_matching_and_precedence

Path matching and precedence

Question: How should I test Allow and Disallow conflicts?

Use Google-specific path matching rules: Google supports * and $ in path values, uses the most specific matching rule, and in equivalent conflicts applies the least restrictive rule.

Next action: Test homepage, public guides, admin, cart, checkout, parameter, and file-extension paths before publishing.

Not proof: A visually plausible robots.txt file is not proof that every important URL is allowed or blocked as intended.

sitemap_discovery_not_allow_rule

Sitemap discovery is not an Allow rule

Question: Does a Sitemap line override Disallow rules?

No. A Sitemap line gives crawlers a discovery hint, but Google ignores it when processing allow/disallow groups for robots.txt matching.

Next action: Add a canonical Sitemap line, but still test whether Googlebot can fetch important public paths.

Not proof: A Sitemap line is not proof that a URL is crawlable or indexed.

verify_googlebot_identity

Verify Googlebot identity

Question: Can I trust a log row that says Googlebot?

Not by user-agent alone. User-agent strings can be spoofed, so use source IP, reverse DNS, and published IP ranges when log proof matters.

Next action: Classify Googlebot log rows separately from human sessions and keep identity proof with the audit record.

Not proof: A Googlebot-looking user-agent is not proof of a real Google crawler or human traffic.

live_test_and_measurement

Live test and real measurement

Question: What should count as success after changing robots.txt?

Use live fetch checks and real Search Console clicks or tool activations. Do not use fake searches, self-clicks, or crawler hits as traffic proof.

Next action: Run the live Googlebot robots checker, then review query-level clicks and CTR after Search Console refreshes.

Not proof: No fake searches, no self-clicks, and no crawler hits counted as human traffic.

Decision table

EvidenceMeaningNext actionNot proof
robots.txt is not access control No. robots.txt is a public crawler preference file. It can manage crawling, but private or sensitive pages need authentication, password protection, noindex, or removal controls. Use robots.txt only for crawl preferences and safe-to-public path patterns; keep private content behind real access control. A Disallow line is not proof that a page is private, deindexed, or inaccessible.
Keep Googlebot open for Search The main risk is blocking Googlebot or User-agent: * by accident while still expecting Google Search traffic. Make the intended Googlebot outcome explicit before publishing. Use the Googlebot-safe preset, test priority public paths, then run the live checker after upload. Allowing Googlebot is not a ranking guarantee or click guarantee.
Keep Google-Extended separate Yes. Treat Google-Extended as a separate product token and policy decision from Googlebot. Do not present Google-Extended blocking as a Google Search ranking improvement. Write Googlebot and Google-Extended rules separately, then test both paths. A Google-Extended rule is not a Search ranking signal or proof of Googlebot crawling.
Googlebot Smartphone and Desktop use the same token Google says Googlebot Smartphone and Googlebot Desktop both obey the same Googlebot product token in robots.txt, so a robots.txt generator should avoid pretending to split them with separate product-token rules. Use one Googlebot rule set for Search crawling and validate mobile/desktop behavior in Search Console or logs separately. A robots.txt draft cannot prove mobile-first indexing outcomes.
Path matching and precedence Use Google-specific path matching rules: Google supports * and $ in path values, uses the most specific matching rule, and in equivalent conflicts applies the least restrictive rule. Test homepage, public guides, admin, cart, checkout, parameter, and file-extension paths before publishing. A visually plausible robots.txt file is not proof that every important URL is allowed or blocked as intended.
Sitemap discovery is not an Allow rule No. A Sitemap line gives crawlers a discovery hint, but Google ignores it when processing allow/disallow groups for robots.txt matching. Add a canonical Sitemap line, but still test whether Googlebot can fetch important public paths. A Sitemap line is not proof that a URL is crawlable or indexed.
Verify Googlebot identity Not by user-agent alone. User-agent strings can be spoofed, so use source IP, reverse DNS, and published IP ranges when log proof matters. Classify Googlebot log rows separately from human sessions and keep identity proof with the audit record. A Googlebot-looking user-agent is not proof of a real Google crawler or human traffic.
Live test and real measurement Use live fetch checks and real Search Console clicks or tool activations. Do not use fake searches, self-clicks, or crawler hits as traffic proof. Run the live Googlebot robots checker, then review query-level clicks and CTR after Search Console refreshes. No fake searches, no self-clicks, and no crawler hits counted as human traffic.

Proof links for AI agents

Official Google references and caveats

ReferenceUse forCaveat
Google robots.txt introduction Use for the core caveat that robots.txt manages crawler access and crawl load, but is not a mechanism for keeping a web page out of Google. Use noindex, password protection, or removal flows when the goal is keeping content out of search or private.
Google robots.txt specification interpretation Use for Google rule parsing, allow/disallow matching, wildcard and end-anchor support, most-specific-rule precedence, and Sitemap parser caveats. This helps test rule outcomes; it is not a ranking guarantee.
Googlebot documentation Use for the Googlebot product token, Smartphone/Desktop crawler caveat, blocking impact, noindex/password guidance, and Googlebot verification. User-agent strings can be spoofed, so identity checks need DNS or IP-range proof.
Google common crawlers and Google-Extended Use for the distinction between Googlebot and Google-Extended, including the Search/ranking caveat for Google-Extended controls. Google-Extended is a standalone product token and does not replace Googlebot search crawling policy.
Google crawler overview Use for crawler categories and identity verification using user-agent header, source IP, and reverse DNS hostname. Identity verification is crawler proof, not human traffic proof.
Googlebot IP ranges Use when matching server logs against published Googlebot IP ranges. IP-range validation should be paired with reverse DNS checks for operational audits.

FAQ

Does a Google-safe robots.txt guarantee Search traffic?

No. It reduces avoidable crawler-policy mistakes. It does not guarantee ranking, indexing, clicks, citations, or traffic.

What is the safest next action?

Use the generator, test Googlebot paths, keep Google-Extended separate, run the live checker, and review Search Console after data refreshes.