Guide12 min read

llms.txt: The Complete Guide (What It Is, How to Create One, and Examples for 2026)

llms.txt is the plain-markdown file that tells ChatGPT, Gemini, Claude, and Perplexity how to read your site. A field-tested guide to what llms.txt is, why it matters for AI search, how to write one, real examples, and how to test that models actually use it.

Atomik Digital ResearchJul 5, 2026

llms.txt: The Complete Guide (What It Is, How to Create One, and Examples for 2026)

llms.txt is a plain-markdown file placed at the root of your domain (yourdomain.com/llms.txt) that gives large language models — ChatGPT, Gemini, Claude, and Perplexity — a clean, structured summary of what your site is, what it offers, and which URLs matter most. Think of it as robots.txt for meaning: robots.txt tells crawlers what they can fetch, llms.txt tells language models what your site is actually about. This guide covers what llms.txt is, why it matters in 2026, exactly how to create one, real examples, and how to verify the models actually use it.

What is llms.txt? The short definition

llms.txt is an open standard proposed by Jeremy Howard (Answer.AI) in September 2024 for a single markdown file served at /llms.txt that summarizes a website in a form optimized for LLM ingestion. It is not HTML, not JSON, not schema — it is human-readable markdown with a required H1 (the site name), an optional blockquote description, free-form context sections, and one or more H2 sections containing markdown link lists to the most important URLs on the site. Models and AI search engines fetch it, parse it in a single pass, and use it to understand and cite your brand.

Why llms.txt matters for AI search in 2026

LLM context windows are large but not infinite, and generative engines have milliseconds to decide which sources to retrieve. When a model lands on a modern web page it has to strip navigation, ads, cookie banners, JavaScript, and boilerplate before it can reason about the content — expensive and lossy. llms.txt short-circuits that: one fetch returns a clean, canonical summary plus the exact URLs the model should follow next. Perplexity in particular has been observed consuming llms.txt heavily; teams that ship a good one often see referral traffic from perplexity.ai move within days. It is the cheapest and fastest AI-visibility win in the stack.

Is llms.txt an official standard?

It is a community standard, not a W3C or IETF spec. There is no regulator forcing compliance and no engine has publicly committed to using it exclusively. That said, the spec has been adopted by thousands of sites — Anthropic, Cloudflare, Perplexity, Vercel, Zapier, Stripe docs, Supabase, and much of the modern developer ecosystem all ship one. In practice llms.txt is a de facto standard because it is trivial to publish and models increasingly look for it.

llms.txt vs robots.txt vs sitemap.xml

The three files solve different problems and complement each other. robots.txt tells crawlers which URLs they are allowed to fetch — access control. sitemap.xml lists every indexable URL with metadata for discovery — completeness. llms.txt gives language models a curated, human-readable summary of what the site is and which URLs matter most — meaning and priority. You should ship all three. llms.txt does not replace either of the others.

Will llms.txt help your SEO?

llms.txt does not directly influence Google's classic ranking algorithm — no evidence Google uses it as a ranking signal for blue links. Where it moves the needle is AI search: appearance rate and citation share in ChatGPT, Perplexity, Gemini, and Claude. Because AI answers now intercept a growing share of the queries that used to become Google clicks, ranking in AI answers has become part of the SEO job — and llms.txt is one of the highest-leverage moves for that surface. Ship it as part of a broader Generative Engine Optimization (GEO) program, not as a Google ranking hack.

The llms.txt format, section by section

The spec defines a strict but simple structure. (1) An H1 with the name of the site or project — required, exactly one. (2) An optional blockquote (>) immediately after the H1 with a short description. (3) Zero or more paragraphs of free-form markdown giving additional context. (4) Zero or more H2 sections, each containing a markdown bulleted list of links in the form '- [Title](https://url): optional short description.' A special H2 called 'Optional' signals links that can be skipped if the model is short on context. That is the whole spec — no custom syntax, no XML, no frontmatter.

A minimal llms.txt example

The smallest valid file is under ten lines. Example for a SaaS product: # Atomik Digital > Atomik Digital is a Generative Engine Optimization platform that helps brands get cited by ChatGPT, Gemini, Claude, and Perplexity. ## Core pages - [Home](https://atomikdigital.com): platform overview and pricing. - [GEO Guide](https://atomikdigital.com/geo-guide): step-by-step methodology for ranking in AI answers. - [Blog](https://atomikdigital.com/blog): research and playbooks on AI visibility. That is a complete, spec-compliant llms.txt. Ship it and iterate.

A production llms.txt example (SaaS)

A production file for a mid-sized SaaS typically runs 40–120 lines and looks like this: # Acme Analytics > Acme Analytics is a product analytics platform for B2B SaaS teams tracking activation, retention, and revenue. Acme was founded in 2021 and is used by more than 3,000 teams. The platform includes event tracking, funnels, cohort retention, and revenue attribution. ## Product - [Product overview](https://acme.com/product): full feature tour. - [Pricing](https://acme.com/pricing): plans and per-seat pricing. - [Integrations](https://acme.com/integrations): 60+ native integrations. ## Documentation - [Quickstart](https://docs.acme.com/quickstart): install the SDK and send the first event. - [API reference](https://docs.acme.com/api): REST and Node/Python SDK reference. ## Comparisons - [Acme vs Mixpanel](https://acme.com/compare/mixpanel): feature and pricing comparison. - [Acme vs Amplitude](https://acme.com/compare/amplitude): feature and pricing comparison. ## Optional - [Changelog](https://acme.com/changelog): release notes. - [Blog](https://acme.com/blog): research and case studies.

llms-full.txt: the expanded companion

The spec also defines an optional /llms-full.txt — a single file that inlines the full markdown content of every page listed in llms.txt. This is what large developer docs (Anthropic, Cloudflare, Stripe) ship so an LLM can ingest the entire documentation in one fetch. Most marketing sites do not need llms-full.txt; ship it when your primary audience is developers loading your docs into a coding assistant.

How to create a llms.txt file (step by step)

(1) List the 8–30 URLs on your site that a model should know exist — home, pricing, product/service pages, top comparison pages, key docs, top blog posts. (2) Write a one-line description for each URL that a human would find useful — no keyword stuffing. (3) Write a two-sentence description of the whole site for the blockquote. (4) Group the URLs into 3–6 H2 sections that match how you'd explain your site to a new hire. (5) Move anything nice-to-have into an '## Optional' section. (6) Save the file as llms.txt with UTF-8 encoding and no BOM. (7) Deploy it at yourdomain.com/llms.txt and confirm it returns HTTP 200 with content-type text/plain or text/markdown. That is the whole process.

llms.txt generators vs writing by hand

Several llms.txt generators now exist (Firecrawl, Mintlify, Answer.AI's reference generator, various open-source scripts). They crawl your sitemap and emit a first draft. The output is a great starting point but always needs manual editing — auto-generated descriptions read like meta descriptions, and generators cannot judge which pages actually matter. Use a generator for the scaffolding, then rewrite descriptions and prune ruthlessly. Ten curated URLs with sharp descriptions outperform a hundred autogenerated ones.

Where to host llms.txt

Serve the file at the root of every domain and subdomain a model might land on: yourdomain.com/llms.txt, docs.yourdomain.com/llms.txt, app.yourdomain.com/llms.txt if it is public. Return HTTP 200, a text/plain or text/markdown content type, and reasonable cache headers (24 hours is fine). Do not require authentication, do not gate behind a paywall, and do not redirect through JavaScript — models will not execute JS to fetch it. On TanStack, Next, or Astro, either drop the file into /public or ship a server route that returns the markdown with the right headers.

How to see the llms.txt file of a website

Any llms.txt file, if present, lives at yourdomain.com/llms.txt. Open the URL directly in a browser or run 'curl -sSL https://yourdomain.com/llms.txt' from a terminal. If the response is HTML (usually a 404 page rendered by the site's frontend router) the site has not shipped one. Directories like llmstxt.site and llmstxt.directory catalog known files across the web and are useful for benchmarking against competitors and reference sites.

How to test that models actually use your llms.txt

Publish the file, then ask each major model direct questions your llms.txt should let it answer: 'What does [brand] do?', 'What is the pricing of [brand]?', 'Compare [brand] and [competitor].' Log which URLs are cited. Perplexity typically reflects llms.txt changes within days; ChatGPT and Gemini follow on a weekly to monthly cadence tied to their retrieval refreshes. Track appearance rate and citation share weekly against a fixed prompt set — if the numbers move after publishing llms.txt and nothing else changed, the file is doing its job.

Common llms.txt mistakes to avoid

(1) Multiple H1s — the spec allows exactly one. (2) Dumping every URL from your sitemap — curation is the whole point. (3) Marketing copy in descriptions — write for another human, not a landing page. (4) Serving the file from a subpath like /docs/llms.txt — it must be at the root. (5) Returning HTML instead of plain text. (6) Blocking AI crawlers in robots.txt while shipping llms.txt — the file is useless if GPTBot, PerplexityBot, Google-Extended, and ClaudeBot cannot fetch it. (7) Publishing once and forgetting — treat llms.txt as living inventory and update it every time you launch or retire a significant page.

How llms.txt fits into a full GEO stack

llms.txt is one of five pillars of Generative Engine Optimization: crawler access (robots.txt allowlisting plus llms.txt), entity graph (Wikidata, Wikipedia, Google Business Profile), LLM-native schema (Organization, WebSite, FAQPage, Article), citation authority on third-party sources (Reddit, G2, industry trades), and definitive content engineered for retrieval. llms.txt alone will not saturate appearance rate — but skipping it caps every other pillar, because models cannot cite what they cannot cleanly understand.

The bottom line

llms.txt is the cheapest, fastest, highest-leverage move you can make for AI visibility in 2026. It is a plain-markdown file, spec-compliant in a few dozen lines, that gives ChatGPT, Gemini, Claude, and Perplexity a clean map of your site. Ship it at yourdomain.com/llms.txt, verify it returns as plain text, allow the AI crawlers in robots.txt, then track appearance rate weekly to prove it is doing work. It is not a Google ranking hack — it is a foundational component of any serious Generative Engine Optimization program.

Run a free AI Visibility Audit

Atomik Digital's platform runs your brand against ChatGPT, Gemini, Claude, and Perplexity, tells you whether the models can find and parse your llms.txt, and returns a prioritized backlog of the fixes that will move your appearance rate fastest. The audit takes about 60 seconds and is the fastest way to see where your brand stands in the AI answer layer today.

Want to see where your brand ranks?

Run a free AI Visibility Audit across the major models.

Run Free AI Audit