April 17, 2026

llms.txt: the complete 2026 guide (with examples)

An llms.txt file is a plain-text file placed at the root of a website that tells large language models which pages contain authoritative, citable content — the robots.txt of the AI era.

If your site doesn't have one, you're leaving it to a crawler's best guess. In 2026, with ChatGPT, Perplexity, and Google AI Overviews pulling citations from indexed content in real time, "best guess" is not a strategy.

What is llms.txt? (Origin and definition)

The spec was proposed by Jeremy Howard of fast.ai in September 2024 and is maintained at llmstxt.org. The core premise: LLMs don't navigate the web like humans. They process text in bulk, often with context window limits, and need a concise map of what's worth reading.

An llms.txt file gives them that map.

The format is minimal:

An H1 with your site or brand name
An optional blockquote describing the site's purpose
One or more H2 sections grouping related URLs
Markdown links with optional short descriptions

That's it. No JSON, no schema, no special syntax to learn.

What does a valid llms.txt look like?

Here is the minimum viable file:

# Acme Corp

> B2B SaaS for supply chain automation. Founded 2019. SOC 2 Type II certified.

## Docs

- [API Reference](https://acme.com/docs/api): Full REST API with request/response examples.
- [Getting Started](https://acme.com/docs/quickstart): 5-minute setup guide.

## Blog

- [How We Cut Deployment Time by 40%](https://acme.com/blog/deployment): Case study with benchmark data.

The H1 is your brand anchor. The blockquote is your 1-2 sentence "why cite me" pitch. The H2 sections act as topic clusters. The links are what LLMs will actually follow.

Anthropic publishes its own llms.txt at https://anthropic.com/llms.txt. As of early 2026, it lists product documentation, research papers, and usage policy pages — the exact content Anthropic wants Claude (and other models) to reference accurately.

When should you add llms.txt?

Add it if any of the following are true:

Your site has more than 20 pages and some are significantly higher-quality than others
You publish technical documentation, research, or proprietary data
Your target audience uses AI assistants to answer questions your content addresses
You've noticed your site being cited incorrectly or incompletely in AI outputs

Do not add it expecting overnight citation gains. llms.txt signals intent — crawlers must still index your content and the model must decide it's authoritative. It's a necessary condition, not a sufficient one.

Real-world llms.txt examples

Anthropic (anthropic.com/llms.txt): Structured by product area. Includes links to the Acceptable Use Policy and Claude's model card — content Anthropic wants cited accurately. Notable: they also serve llms-full.txt, a second file that includes the full text of key pages for models that can handle larger context.

llmstxt.org itself publishes a reference implementation. The file is under 30 lines and demonstrates every valid element.

At the time of writing, Stripe does not publish a public llms.txt, though their developer docs are extensive enough that crawlers index them aggressively anyway. A formal file would give Stripe more control over which documentation versions get cited.

Several U.S. federal agency subdomains (CDC, FDA) have begun experimenting with llms.txt files on developer portals, primarily to surface authoritative guidance documents over derivative content.

What are the most common llms.txt mistakes?

1. Empty or skeleton files. A file that exists but contains only an H1 and no links does nothing. Crawlers may index the file and find no useful signal.

2. Wrong heading format. The spec requires Markdown H1 for the site name and H2 for section headers. Using H3 for sections, or plain bold text, breaks parsers that follow the llmstxt.org spec strictly.

3. Listing every page equally. llms.txt is meant to highlight your best content — if you include 200 links with no descriptions, you've created noise. Aim for 10-40 links with brief, descriptive annotations.

4. No descriptions on links. The inline description after the colon in [Page Title](URL): Description is where you front-load the citable value. "Our pricing page" is useless. "Pricing: $29 per audit, $99/mo for 20 audits, $299/mo for unlimited" gives the model something to quote.

5. Serving the file with the wrong Content-Type. It should be text/plain or text/markdown. Some CMS platforms default to text/html for unfamiliar extensions.

6. Not maintaining it. An llms.txt that links to 404s is worse than no file — it signals that the site isn't maintained.

How do you validate your llms.txt?

Fetch it directly: curl https://yourdomain.com/llms.txt — confirm it returns 200 and the right Content-Type
Check the Markdown renders correctly by pasting it into any Markdown previewer
Confirm all linked URLs return 200 (a simple link checker like linkchecker handles this)
Run a free CiteReady audit — it checks for llms.txt presence and flags common formatting errors as part of the 7-dimension score

The spec doesn't yet have an official validator, but llmstxt.org maintains a community list of correctly-implemented examples you can benchmark against.

Does llms.txt actually affect whether AI cites you?

Directionally, yes. Structurally, it's one signal among many. A well-formed llms.txt helps LLMs find your highest-quality content; it does nothing to improve that content's citability on its own. You still need clean schema, quotable passage-level writing, and fresh timestamps.

Think of llms.txt as the front door. You still have to furnish the house.

Action checklist

Create /llms.txt at your domain root with an H1, a blockquote description, and 10-40 curated links grouped by topic with inline descriptions.
Confirm it returns HTTP 200 with Content-Type: text/plain and that every linked URL is live.
Run a free CiteReady audit to verify detection and flag any formatting issues.

FAQ

Is llms.txt an official standard? No. It's a community-proposed convention originated by Jeremy Howard and documented at llmstxt.org. OpenAI, Anthropic, and Google have not formally endorsed the spec, but several major sites — including Anthropic itself — have adopted it.

Does llms.txt replace robots.txt for AI crawlers? No. robots.txt controls crawler access (whether bots can fetch pages at all). llms.txt is advisory — it guides models on what to prioritize, but doesn't enforce access restrictions. You need both.

How often should I update my llms.txt? Whenever you publish significant new content or deprecate old pages. A stale llms.txt pointing to outdated articles can cause AI tools to surface old information as authoritative. Monthly review is a reasonable cadence for active content sites.

Run a free CiteReady audit on your site

Get your score across all 7 dimensions — with a Claude-Opus executive summary and prioritized fix plan. First full audit is free.

Quick score →Full preview