| robots.txt is not access control |
No. robots.txt is a public crawler preference file. It can manage crawling, but private or sensitive pages need authentication, password protection, noindex, or removal controls. |
Use robots.txt only for crawl preferences and safe-to-public path patterns; keep private content behind real access control. |
A Disallow line is not proof that a page is private, deindexed, or inaccessible. |
| Keep Googlebot open for Search |
The main risk is blocking Googlebot or User-agent: * by accident while still expecting Google Search traffic. Make the intended Googlebot outcome explicit before publishing. |
Use the Googlebot-safe preset, test priority public paths, then run the live checker after upload. |
Allowing Googlebot is not a ranking guarantee or click guarantee. |
| Keep Google-Extended separate |
Yes. Treat Google-Extended as a separate product token and policy decision from Googlebot. Do not present Google-Extended blocking as a Google Search ranking improvement. |
Write Googlebot and Google-Extended rules separately, then test both paths. |
A Google-Extended rule is not a Search ranking signal or proof of Googlebot crawling. |
| Googlebot Smartphone and Desktop use the same token |
Google says Googlebot Smartphone and Googlebot Desktop both obey the same Googlebot product token in robots.txt, so a robots.txt generator should avoid pretending to split them with separate product-token rules. |
Use one Googlebot rule set for Search crawling and validate mobile/desktop behavior in Search Console or logs separately. |
A robots.txt draft cannot prove mobile-first indexing outcomes. |
| Path matching and precedence |
Use Google-specific path matching rules: Google supports * and $ in path values, uses the most specific matching rule, and in equivalent conflicts applies the least restrictive rule. |
Test homepage, public guides, admin, cart, checkout, parameter, and file-extension paths before publishing. |
A visually plausible robots.txt file is not proof that every important URL is allowed or blocked as intended. |
| Sitemap discovery is not an Allow rule |
No. A Sitemap line gives crawlers a discovery hint, but Google ignores it when processing allow/disallow groups for robots.txt matching. |
Add a canonical Sitemap line, but still test whether Googlebot can fetch important public paths. |
A Sitemap line is not proof that a URL is crawlable or indexed. |
| Verify Googlebot identity |
Not by user-agent alone. User-agent strings can be spoofed, so use source IP, reverse DNS, and published IP ranges when log proof matters. |
Classify Googlebot log rows separately from human sessions and keep identity proof with the audit record. |
A Googlebot-looking user-agent is not proof of a real Google crawler or human traffic. |
| Live test and real measurement |
Use live fetch checks and real Search Console clicks or tool activations. Do not use fake searches, self-clicks, or crawler hits as traffic proof. |
Run the live Googlebot robots checker, then review query-level clicks and CTR after Search Console refreshes. |
No fake searches, no self-clicks, and no crawler hits counted as human traffic. |