Blocking User-agent: * too broadly
A blanket disallow can remove normal Googlebot crawling along with bots you meant to restrict.
Check your robots.txtGooglebot and Google-Extended sound similar, but they do different jobs. Googlebot is the crawler token that affects Google Search and related search surfaces. Google-Extended is a separate robots.txt control for certain Gemini and Vertex AI use cases, not a Google Search ranking signal.
Googlebot. If your policy allows Search but limits certain Gemini/Vertex AI uses, use a separate Google-Extended rule.
| Token | Documented role | What blocking can affect | Default for public SEO |
|---|---|---|---|
Googlebot |
Google's main crawler token for Google Search and related products. | Google Search, Discover, Google Images, Google Video, Google News, and other Search features. | Allow, unless a URL should not be crawled or indexed. |
Google-Extended |
Standalone product token for managing whether content crawled by Google may be used for Gemini model training and grounding in Gemini Apps / Vertex AI. | Specified Gemini and Vertex AI uses. Google says it does not impact Search inclusion and is not a Search ranking signal. | Decide separately from Googlebot based on AI-use policy. |
Google-InspectionTool |
Search testing tools such as Rich Results Test and URL Inspection. | Search testing tools, not Google Search itself. | Usually allow for diagnostics. |
GoogleOther |
Generic crawler used by various Google product teams for public content fetches. | No specific product effect documented in the same way as Googlebot. | Review separately if you manage a strict crawler policy. |
Keep Googlebot able to crawl your public pages. Google's guidance for AI features says normal SEO fundamentals still apply to AI Overviews and AI Mode, and eligible supporting links need to be indexed and eligible for snippets.
User-agent: Googlebot Allow: / Sitemap: https://example.com/sitemap.xml
This is the safest default for public marketing sites, SaaS docs, local business pages, ecommerce category pages, and any content that depends on Google Search discovery.
You can express a separate rule for Google-Extended while keeping Googlebot open. Confirm current Google documentation before publishing, especially if your business relies on search traffic.
User-agent: Googlebot Allow: / User-agent: Google-Extended Disallow: / Sitemap: https://example.com/sitemap.xml
This is the common "Search yes, AI-use control no" pattern. It keeps Googlebot open for Search while using the standalone Google-Extended token for the Gemini/Vertex AI policy choice.
User-agent: Googlebot Disallow: / User-agent: Google-Extended Disallow: / Sitemap: https://example.com/sitemap.xml
Use this only when you intentionally do not want normal Google Search crawling for the affected URLs. For most public traffic funnels, this is not the right default.
| Goal | Googlebot | Google-Extended | Best next step |
|---|---|---|---|
| Maximize Google Search and AI-search visibility | Allow | Allow | Focus on crawlable HTML, internal links, sitemap, structured data that matches visible content, and useful pages. |
| Keep Google Search, limit certain Gemini/Vertex AI uses | Allow | Disallow | Publish the split rule, record the policy date, and monitor Googlebot separately from Google-Extended control intent. |
| Hide private or restricted content | Do not rely on robots.txt | Do not rely on robots.txt | Use authentication, password protection, access control, or noindex where appropriate. |
| Reduce crawl load on low-value URLs | Selective rules only | Separate AI-use decision | Block faceted, duplicate, or low-value paths carefully without blocking important public pages. |
User-agent: * too broadlyA blanket disallow can remove normal Googlebot crawling along with bots you meant to restrict.
Check your robots.txtGoogle says Google-Extended does not impact Search inclusion and is not a Search ranking signal.
View crawler tokensGoogle says robots.txt is not a mechanism for keeping a page out of Google. Use authentication or noindex where appropriate.
Run the audit checklistUser-agent strings can be spoofed. Verify serious Googlebot claims with reverse DNS or Google's published IP ranges.
Analyze crawler logsGooglebot.Sitemap: https://your-domain.com/sitemap.xml to robots.txt.Googlebot, Google-InspectionTool, and suspicious spoofed user agents.