Is robots.txt access control?

No. robots.txt is a public crawler preference file, not private access control. Use authentication, password protection, noindex, or removals when content must stay private or out of search.

Should Googlebot and Google-Extended be handled separately?

Yes. Googlebot is the Search crawler token to protect when Search traffic matters. Google-Extended is a separate policy token and should not be described as a Search ranking signal.

What counts as proof after changing robots.txt?

Use live robots.txt checks, verified crawler identity, Search Console clicks, and tool activations. Do not count fake searches, self-clicks, or crawler hits as human traffic.

Google Robots.txt Safety Evidence Matrix: Googlebot, Google-Extended, Proof

Evidence rows

robots_txt_not_access_control

robots.txt is not access control

Question: Can I hide private pages from Google with robots.txt?

No. robots.txt is a public crawler preference file. It can manage crawling, but private or sensitive pages need authentication, password protection, noindex, or removal controls.

Next action: Use robots.txt only for crawl preferences and safe-to-public path patterns; keep private content behind real access control.

Not proof: A Disallow line is not proof that a page is private, deindexed, or inaccessible.

Google robots.txt introduction official_reference
Google robots.txt generator tool
Live Googlebot robots checker API api

keep_googlebot_search_open

Keep Googlebot open for Search

Question: What is the main risk in a Google robots.txt generator?

The main risk is blocking Googlebot or User-agent: * by accident while still expecting Google Search traffic. Make the intended Googlebot outcome explicit before publishing.

Next action: Use the Googlebot-safe preset, test priority public paths, then run the live checker after upload.

Not proof: Allowing Googlebot is not a ranking guarantee or click guarantee.

Googlebot documentation official_reference
Google robots.txt safety pack dataset
Googlebot checker tool

google_extended_separate_policy

Keep Google-Extended separate

Question: Can I block Google-Extended without blocking Googlebot?

Yes. Treat Google-Extended as a separate product token and policy decision from Googlebot. Do not present Google-Extended blocking as a Google Search ranking improvement.

Next action: Write Googlebot and Google-Extended rules separately, then test both paths.

Not proof: A Google-Extended rule is not a Search ranking signal or proof of Googlebot crawling.

Google common crawlers and Google-Extended official_reference
Google-Extended vs Googlebot guide guide
Google-Extended checker tool

googlebot_mobile_desktop_same_token

Googlebot Smartphone and Desktop use the same token

Question: Can robots.txt target Googlebot Smartphone and Desktop differently?

Google says Googlebot Smartphone and Googlebot Desktop both obey the same Googlebot product token in robots.txt, so a robots.txt generator should avoid pretending to split them with separate product-token rules.

Next action: Use one Googlebot rule set for Search crawling and validate mobile/desktop behavior in Search Console or logs separately.

Not proof: A robots.txt draft cannot prove mobile-first indexing outcomes.

Googlebot documentation official_reference
Googlebot robots.txt checker tool

path_matching_and_precedence

Path matching and precedence

Question: How should I test Allow and Disallow conflicts?

Use Google-specific path matching rules: Google supports * and $ in path values, uses the most specific matching rule, and in equivalent conflicts applies the least restrictive rule.

Next action: Test homepage, public guides, admin, cart, checkout, parameter, and file-extension paths before publishing.

Not proof: A visually plausible robots.txt file is not proof that every important URL is allowed or blocked as intended.

Google robots.txt specification interpretation official_reference
Google robots.txt generator path tester tool

sitemap_discovery_not_allow_rule

Sitemap discovery is not an Allow rule

Question: Does a Sitemap line override Disallow rules?

No. A Sitemap line gives crawlers a discovery hint, but Google ignores it when processing allow/disallow groups for robots.txt matching.

Next action: Add a canonical Sitemap line, but still test whether Googlebot can fetch important public paths.

Not proof: A Sitemap line is not proof that a URL is crawlable or indexed.

Google robots.txt specification interpretation official_reference
Live sitemap sitemap

verify_googlebot_identity

Verify Googlebot identity

Question: Can I trust a log row that says Googlebot?

Not by user-agent alone. User-agent strings can be spoofed, so use source IP, reverse DNS, and published IP ranges when log proof matters.

Next action: Classify Googlebot log rows separately from human sessions and keep identity proof with the audit record.

Not proof: A Googlebot-looking user-agent is not proof of a real Google crawler or human traffic.

Google crawler overview official_reference
Googlebot IP ranges official_reference
Bot detection log analyzer tool

live_test_and_measurement

Live test and real measurement

Question: What should count as success after changing robots.txt?

Use live fetch checks and real Search Console clicks or tool activations. Do not use fake searches, self-clicks, or crawler hits as traffic proof.

Next action: Run the live Googlebot robots checker, then review query-level clicks and CTR after Search Console refreshes.

Not proof: No fake searches, no self-clicks, and no crawler hits counted as human traffic.

Decision table

Evidence	Meaning	Next action	Not proof
robots.txt is not access control	No. robots.txt is a public crawler preference file. It can manage crawling, but private or sensitive pages need authentication, password protection, noindex, or removal controls.	Use robots.txt only for crawl preferences and safe-to-public path patterns; keep private content behind real access control.	A Disallow line is not proof that a page is private, deindexed, or inaccessible.
Keep Googlebot open for Search	The main risk is blocking Googlebot or User-agent: * by accident while still expecting Google Search traffic. Make the intended Googlebot outcome explicit before publishing.	Use the Googlebot-safe preset, test priority public paths, then run the live checker after upload.	Allowing Googlebot is not a ranking guarantee or click guarantee.
Keep Google-Extended separate	Yes. Treat Google-Extended as a separate product token and policy decision from Googlebot. Do not present Google-Extended blocking as a Google Search ranking improvement.	Write Googlebot and Google-Extended rules separately, then test both paths.	A Google-Extended rule is not a Search ranking signal or proof of Googlebot crawling.
Googlebot Smartphone and Desktop use the same token	Google says Googlebot Smartphone and Googlebot Desktop both obey the same Googlebot product token in robots.txt, so a robots.txt generator should avoid pretending to split them with separate product-token rules.	Use one Googlebot rule set for Search crawling and validate mobile/desktop behavior in Search Console or logs separately.	A robots.txt draft cannot prove mobile-first indexing outcomes.
Path matching and precedence	Use Google-specific path matching rules: Google supports * and $ in path values, uses the most specific matching rule, and in equivalent conflicts applies the least restrictive rule.	Test homepage, public guides, admin, cart, checkout, parameter, and file-extension paths before publishing.	A visually plausible robots.txt file is not proof that every important URL is allowed or blocked as intended.
Sitemap discovery is not an Allow rule	No. A Sitemap line gives crawlers a discovery hint, but Google ignores it when processing allow/disallow groups for robots.txt matching.	Add a canonical Sitemap line, but still test whether Googlebot can fetch important public paths.	A Sitemap line is not proof that a URL is crawlable or indexed.
Verify Googlebot identity	Not by user-agent alone. User-agent strings can be spoofed, so use source IP, reverse DNS, and published IP ranges when log proof matters.	Classify Googlebot log rows separately from human sessions and keep identity proof with the audit record.	A Googlebot-looking user-agent is not proof of a real Google crawler or human traffic.
Live test and real measurement	Use live fetch checks and real Search Console clicks or tool activations. Do not use fake searches, self-clicks, or crawler hits as traffic proof.	Run the live Googlebot robots checker, then review query-level clicks and CTR after Search Console refreshes.	No fake searches, no self-clicks, and no crawler hits counted as human traffic.

Proof links for AI agents

Official Google references and caveats

Reference	Use for	Caveat
Google robots.txt introduction	Use for the core caveat that robots.txt manages crawler access and crawl load, but is not a mechanism for keeping a web page out of Google.	Use noindex, password protection, or removal flows when the goal is keeping content out of search or private.
Google robots.txt specification interpretation	Use for Google rule parsing, allow/disallow matching, wildcard and end-anchor support, most-specific-rule precedence, and Sitemap parser caveats.	This helps test rule outcomes; it is not a ranking guarantee.
Googlebot documentation	Use for the Googlebot product token, Smartphone/Desktop crawler caveat, blocking impact, noindex/password guidance, and Googlebot verification.	User-agent strings can be spoofed, so identity checks need DNS or IP-range proof.
Google common crawlers and Google-Extended	Use for the distinction between Googlebot and Google-Extended, including the Search/ranking caveat for Google-Extended controls.	Google-Extended is a standalone product token and does not replace Googlebot search crawling policy.
Google crawler overview	Use for crawler categories and identity verification using user-agent header, source IP, and reverse DNS hostname.	Identity verification is crawler proof, not human traffic proof.
Googlebot IP ranges	Use when matching server logs against published Googlebot IP ranges.	IP-range validation should be paired with reverse DNS checks for operational audits.

FAQ

Does a Google-safe robots.txt guarantee Search traffic?

No. It reduces avoidable crawler-policy mistakes. It does not guarantee ranking, indexing, clicks, citations, or traffic.

What is the safest next action?

Use the generator, test Googlebot paths, keep Google-Extended separate, run the live checker, and review Search Console after data refreshes.

Keep Googlebot, Google-Extended, path rules, proof links, and traffic measurement separate.

For Google Search, keep Googlebot crawlable; handle Google-Extended separately.