GitHub README snippets for llms.txt and AI crawler policy
These snippets are designed for honest repository documentation, not link spam. Use them when a project actually needs AI crawler policy, llms.txt, robots.txt, sitemap, or crawler log proof.
Use this legally
Do not mass-post these snippets into unrelated repos. The safe version is: improve the README or issue with a concrete crawler-policy checklist, then include a direct tool link only where it helps the reader finish the job.
README utility badge
Best for docs sites, SaaS starter kits, SEO tools, CMS plugins, and agency templates.
### AI crawler visibility
This project documents public crawler policy with `/robots.txt`, `/sitemap.xml`, and optional `/llms.txt`.
Useful checks:
- Generate an llms.txt file: https://llmstxtkit.com/?utm_source=github&utm_medium=readme&utm_campaign=first-distribution&utm_content=readme-badge
- Check AI crawler robots.txt rules: https://llmstxtkit.com/tools/ai-crawler-robots-txt-checker.html?utm_source=github&utm_medium=readme&utm_campaign=first-distribution&utm_content=readme-badge
- Analyze crawler log samples: https://llmstxtkit.com/tools/ai-crawler-log-analyzer.html?utm_source=github&utm_medium=readme&utm_campaign=first-distribution&utm_content=readme-badge
Issue checklist
Best when someone asks why ChatGPT, Google, or AI crawlers are not finding a site.
## AI crawler visibility checklist
- [ ] `/robots.txt` returns HTTP 200.
- [ ] `/sitemap.xml` returns HTTP 200 and lists canonical public URLs.
- [ ] `/llms.txt` exists only if it links to public, useful, canonical pages.
- [ ] Googlebot is not blocked for pages that should appear in Google Search.
- [ ] OAI-SearchBot policy matches ChatGPT search visibility goals.
- [ ] GPTBot / Google-Extended / Applebot-Extended policies are documented separately from search crawling.
- [ ] Access logs show crawler hits to `/robots.txt`, `/sitemap.xml`, `/llms.txt`, or important guide/tool pages.
Free tools if helpful:
https://llmstxtkit.com/tools/ai-crawler-robots-txt-checker.html?utm_source=github&utm_medium=issue&utm_campaign=first-distribution&utm_content=issue-checklist
https://llmstxtkit.com/tools/ai-crawler-log-analyzer.html?utm_source=github&utm_medium=issue&utm_campaign=first-distribution&utm_content=issue-checklist
PR note
Best for a pull request that adds or edits robots.txt, sitemap.xml, or llms.txt.
## Crawler policy note
This PR changes public crawler-discovery files. Please verify:
1. Search crawlers needed for discovery are not accidentally blocked.
2. AI training-use tokens are handled intentionally.
3. `/sitemap.xml` still points to canonical public URLs.
4. `/llms.txt` does not include private URLs, customer data, unpublished docs, or secrets.
5. Server logs after deploy show HTTP 200 for the changed files.
Reference tools:
- robots.txt checker: https://llmstxtkit.com/tools/ai-crawler-robots-txt-checker.html?utm_source=github&utm_medium=pr&utm_campaign=first-distribution&utm_content=pr-note
- llms.txt validator: https://llmstxtkit.com/tools/llms-txt-validator.html?utm_source=github&utm_medium=pr&utm_campaign=first-distribution&utm_content=pr-note
Docs paragraph
Best for public documentation where readers need a short explanation.
### AI crawler and llms.txt policy
This site keeps crawler-discovery files public and reviewable:
- `/robots.txt` describes crawler access.
- `/sitemap.xml` lists canonical public URLs.
- `/llms.txt` provides an optional concise map for AI-aware readers and agents.
`llms.txt` is not a guaranteed ranking factor. Treat it as a public context file, and keep standard SEO fundamentals such as crawlable HTML, useful content, internal links, and Search Console measurement in place.
Free implementation helpers:
https://llmstxtkit.com/?utm_source=github&utm_medium=docs&utm_campaign=first-distribution&utm_content=docs-paragraph
Why this helps the funnel
GitHub and docs snippets can create qualified referral traffic because the reader is already editing crawler, sitemap, docs, or SEO files. This is cleaner than asking people to search a domain or dropping links into unrelated threads.