Is noindex the same as an LLM training opt-out?

No. noindex is primarily an indexing and serving directive for search engines that support it. For AI crawler training or search-use policy, use the crawler's documented robots.txt token when one exists.

Can robots meta tags work if robots.txt blocks the page?

Usually no. A crawler must fetch the page before it can see robots meta tags or X-Robots-Tag headers. If robots.txt blocks the URL, those page-level rules may never be seen.

Daily SEO asset 71 / crawler policy

Do LLM crawlers respect robots meta tags, noindex, or X-Robots-Tag?

Published 2026-06-29. Built for CMS builders, SaaS developers, publishers, and webmasters deciding between robots.txt, robots meta tags, and X-Robots-Tag.

Short answer: do not assume robots meta noindex is a universal LLM crawler opt-out. Use robots.txt for documented crawler tokens and authentication for private content.

Fast answer

If your goal is to understand whether noindex or robots meta tags can opt pages out of LLM crawler use, start with this framing: CMS users often need per-account or per-page controls, but most AI crawler documentation names robots.txt user-agent tokens rather than a universal nolearn or noteach meta directive. The useful deliverable is a decision table for robots.txt, robots meta tags, X-Robots-Tag, authentication, and crawler-specific AI policy rules.

This page is intentionally conservative. It treats crawler files, URL inspection, feeds, and server logs as discovery and measurement aids, not as guaranteed ranking levers.

When to use this playbook

Use it when CMS builders, SaaS developers, publishers, and webmasters deciding between robots.txt, robots meta tags, and X-Robots-Tag need a concrete next step and a page that can be linked from a hub, a community answer, a README, or a launch checklist. The page should help someone make a decision even if they never buy anything or contact the site owner.

The strongest pages in this topic cluster have three traits: they answer one narrow question, they include a copyable artifact, and they link to the relevant tool or proof page so the reader can act immediately.

Recommended workflow

Use robots.txt for crawler access preferences that happen before a page is fetched.
Use robots meta tags or X-Robots-Tag for search indexing and snippet controls after a crawler can fetch the URL.
Do not assume noindex is a training opt-out for every LLM crawler unless that crawler documents support for it.
Use authentication or permissions for private content, because crawler directives are requests rather than access control.

Decision table

Control	When it applies	Best use	Limitation
robots.txt	Before a crawler fetches a URL.	Documented crawler user-agent access preferences, such as GPTBot, OAI-SearchBot, Googlebot, or Google-Extended.	It is a crawler directive, not account-level permission or true access control.
robots meta tag	After a crawler can fetch and parse the HTML page.	Search indexing and snippet behavior for crawlers that support the directive.	If robots.txt blocks the URL, the crawler may never see the meta tag.
X-Robots-Tag	After a crawler can fetch the HTTP response.	Indexing controls for non-HTML files or server-level response policies.	Still not a universal AI training opt-out unless a crawler documents support.
noindex	When a supporting search crawler fetches and processes the page or response.	Keeping a page out of search indexing or serving surfaces.	Do not treat it as a universal nolearn, noteach, or model-training opt-out.
Authentication	Before any crawler or user can see private content.	Tenant, account, customer, billing, admin, or private CMS pages.	Requires product or CMS permission design, not just SEO metadata.

Pre-publish checklist

robots.txt has separate rules for OAI-SearchBot and GPTBot.
Googlebot and Google-Extended are not confused.
noindex pages are still crawlable if the noindex rule must be seen.
private account pages require login.
CMS-level per-page controls are documented for future maintainers.

Copyable working note

Use this as a starting point in a ticket, README, client note, or launch log. Edit it to match the real site before publishing.

Question: Do LLM crawlers respect robots meta tags?
Short answer: do not assume universal support.
Use robots.txt for documented AI crawler tokens.
Use noindex/X-Robots-Tag for search indexing controls.
Use authentication for anything private.
Test: fetch robots.txt, inspect headers/meta, then check logs.

Community answer draft

This no-link draft is written to answer the technical question first. If you post it in a community, review the current thread, platform rules, and disclosure requirements before adding any owned link.

Short answer: do not treat robots meta tags or noindex as a universal LLM crawler opt-out.

The timing matters:
1. robots.txt is checked before fetching a URL, so it is the normal place to express crawler access preferences for crawlers that document and honor user-agent rules.
2. A robots meta tag or X-Robots-Tag header is only visible after the crawler fetches the page or response.
3. noindex is primarily an indexing or serving directive for search engines that support it, not a universal AI training or model-use opt-out.
4. If robots.txt blocks a URL, the crawler may never fetch the page and may never see a page-level meta tag.
5. Private account, customer, or tenant content should use authentication and permissions, because crawler directives are not access control.

For a CMS, I would separate the layers: site owner sets broad robots.txt policy, account or tenant owner controls whether a page is public, and public pages can still use noindex or X-Robots-Tag for search indexing behavior.

Proof and measurement plan

Search Console query filter for robots meta tag, noindex, GPTBot, OAI-SearchBot, and LLM crawler variants.
Referral tracking for Stack Overflow or webmaster answers that cite the page.
Server-log checks for crawler visits to robots.txt, sitemap.xml, llms.txt, and the article URL.
Manual review before posting in Q&A communities to avoid promotional or duplicate answers.

What not to count as proof

Do not count this setup as traffic by itself. A submitted sitemap, an IndexNow receipt, a crawler log hit, or an indexing request can show discovery work, but none of them proves rankings, impressions, clicks, conversions, or AI citations. Organic proof should come from Search Console, analytics, qualified referral evidence, or server logs interpreted for the right purpose.

The main pitfall for this topic is: Relying on a made-up nolearn or noteach meta tag and assuming every LLM crawler will treat it as an opt-out.

Related resources

Primary related guide or tool

Continue the workflow with this related LLMs.txt Kit resource.

/tools/ai-crawler-robots-txt-checker.html

Next supporting resource

Continue the workflow with this related LLMs.txt Kit resource.

/guides/gptbot-vs-oai-searchbot.html

All free tools

Continue the workflow with this related LLMs.txt Kit resource.

/tools/

Proof dashboard

Continue the workflow with this related LLMs.txt Kit resource.

/proof.html

Do LLM crawlers respect robots meta tags, noindex, or X-Robots-Tag?

Fast answer

When to use this playbook

Recommended workflow

Decision table

Pre-publish checklist

Copyable working note

Community answer draft

Proof and measurement plan

What not to count as proof

Related resources

Primary related guide or tool

Next supporting resource

All free tools

Proof dashboard

Sources and guardrails