An `llms.txt` file is a plain-text document at the root of your domain that tells large language models exactly what your business does, who you serve, and which of your pages they should cite. It is to AI search engines what `robots.txt` is to Googlebot and what `sitemap.xml` is to traditional indexers - a small, machine-readable file that quietly determines how a model retrieves and represents your business. We have generated more than 25 of these for clients across real estate, professional services, and B2B software in the past four months. The pattern is consistent: sites with a well-structured `llms.txt` start showing up in ChatGPT, Perplexity, and Gemini answer surfaces within weeks, while otherwise comparable competitors without one remain invisible.
This guide is the technical playbook. What `llms.txt` actually is, where the convention came from, what belongs inside it, the exact format we use in production, and the failure modes we see most often.
The Origin of llms.txt
`llms.txt` is a proposed convention introduced by Jeremy Howard (Answer.AI) in September 2024 and published at [llmstxt.org](https://llmstxt.org). The proposal is deliberately simple. A markdown-formatted, plain-text file lives at `https://yourdomain.com/llms.txt`. Inside it sits a structured summary of the site, a curated list of links to the canonical content the site owner wants surfaced, and optional metadata describing the business.
The proposal solves a problem nobody had cleanly solved before. Large language models can crawl your HTML, but they have to infer what matters from a noisy mix of nav menus, cookie banners, footer boilerplate, plugin output, ad code, and primary content. Even on a clean site, the cost of correctly understanding what the business actually is - its name, location, services, expertise, and trustworthy facts - is high. `llms.txt` strips that down to a curated, authoritative single page that a model can ingest in milliseconds with near-zero ambiguity.
Adoption has accelerated through 2025 and into 2026. Anthropic referenced the convention in their guidance for site owners, OpenAI engineers have publicly commented on parsing it during retrieval, Mintlify and Vercel both shipped automatic `llms.txt` generation for their hosted documentation products, and a growing share of the SEO and AEO tooling stack now ships with `llms.txt` validators. It is not yet a formal standard. It is converging into a de facto one.
What `llms.txt` Is, And What It Is Not
The clearest way to understand `llms.txt` is by analogy.
- File - Audience - Purpose
- `robots.txt` - Search engine crawlers - Tells crawlers which paths they can fetch
- `sitemap.xml` - Search engines - Lists indexable URLs with metadata
- `humans.txt` - Curious humans - Credits the team behind the site
- `llms.txt` - Large language models - Summarises the business and points to canonical content
`llms.txt` is not a permission file. It does not instruct AI crawlers whether they may use your content for training (that is what robots.txt directives like `User-agent: GPTBot` are for). It does not noindex pages. It does not replace schema markup, structured data, or Open Graph tags. It is a content artefact - a curated brief that a retrieval-augmented model can quote, link to, and cite.
The other detail that trips people up: `llms.txt` is not the same as `llms-full.txt`. The proposal includes an optional second file, `llms-full.txt`, that contains the entire textual content of your most important pages concatenated into one document. `llms-full.txt` is heavier (often 100KB+) and exists so a model can ingest your full canonical content in a single fetch. Most businesses we work with start with `llms.txt`. Documentation-heavy sites and SaaS companies benefit from publishing both.
The Format We Use In Production
Here is the exact format we deploy for clients. The structure follows the llmstxt.org spec but is opinionated in three places: we always include a measurable business description in the H1 blockquote, we organize key pages in priority order rather than alphabetically, and we list topics as bulleted statements rather than tags.
A real example, taken from a recent real estate client we shipped to production:
That file is roughly 1.6KB. It loads instantly. A retrieval-augmented model that fetches it gets a complete, authoritative brief on Maria's business in a single round trip.
The Anatomy of a Strong llms.txt
Every section earns its place. Here is what each block does and how to write it.
1. The Title - `# Business Name | Identifier | Location`
The title is an H1. We compose it as `Business Name | Person or Differentiator | Location` because models extract this as the canonical entity label. The Curated Luxury example reads `Curated Luxury Homes | Maria Wilkes | Berkshire Hathaway HomeServices Florida Network Realty - Atlantic Beach, Florida`. That single line tells a model the brand, the principal, the parent organization, and the geography in one shot. Avoid SEO-style stuffing. The title should read like a business card.
2. The Description - A Blockquote With Specifics
The blockquote (`>`) immediately under the H1 is the most important block in the file. It must be 100 to 250 words, written in third person, and dense with specifics: credentials, service areas, differentiators, ranking or volume claims, address, and any institutional affiliations. Generic marketing language fails here. Models will quote this paragraph verbatim when summarizing your business in an AI answer surface, so every clause must be both true and verifiable.
We write descriptions to answer five questions in order:
- Who is the business or principal?
- What do they do specifically?
- Where do they operate?
- What credentials or affiliations make them credible?
- How do they differentiate?
3. Key Pages - Curated, Prioritized
The `## Key Pages` section is a markdown bullet list of the canonical URLs you want models to retrieve when answering questions about your business. Each line follows the format `- [Anchor Text](URL): Short description`. Keep the list to 10 to 25 entries. The anchor text becomes the citation label in AI search surfaces. The description after the colon is what the model uses to decide which URL to retrieve for a given query.
Order the list by priority, not alphabetically. The home page comes first. Service or product pages come next. Authority content (blog, market reports, guides) comes after. Contact comes last. A model that processes the file top-to-bottom will weight earlier entries more heavily.
4. Service Areas, Topics, and About - Optional But high-impact
The remaining sections are optional but compounding. `## Service Areas` is critical for any geo-targeted business - a real estate agent, a local service provider, a regional B2B operator. List actual cities and neighborhoods, not states. `## Topics This Site Covers` is a bulleted list of 10 to 20 short statements describing what your content authoritatively covers. Use natural-language phrases, not keyword lists. `## About` repeats the principal's credentials, address, phone, and email in structured form so a model can extract NAP data without parsing the description.
Where the File Lives and How Models Find It
The file must be served at `https://yourdomain.com/llms.txt` with a `Content-Type` of `text/plain` or `text/markdown`. It is fetched by the same process that fetches `robots.txt` - a single HTTP GET to the well-known path. There is no submission process and no registration. You publish the file, you make it accessible to public HTTP clients, and crawlers find it on their next pass.
A few production details we have learned the hard way:
- No redirects. Serve the file directly at the canonical URL. A 301 to a CDN-hashed path will silently break some retrieval clients.
- No authentication. The file must be reachable without cookies, headers, or query parameters. If you have a bot-protection layer, allowlist `/llms.txt` and `/llms-full.txt`.
- Cache headers. A `Cache-Control: public, max-age=86400` is healthy. Aggressive caching hides updates; no caching wastes bandwidth.
- Encoding. UTF-8, no BOM. Some CMSs default to UTF-8 with BOM and break parsers that assume strict UTF-8.
- Linked from `robots.txt`. Add `Sitemap: https://yourdomain.com/llms.txt` to your `robots.txt` if your stack permits. It is not part of the spec but several crawlers honor it.
For WordPress sites, the simplest deployment is to drop a physical `llms.txt` file in the document root via SFTP or your host's file manager. Plugin-generated `llms.txt` files exist (`Yoast SEO`, `Rank Math`, and a handful of standalone plugins ship beta support in 2026), but plugin output frequently lags manual edits and we still recommend hand-curating.
For Next.js, drop the file in `/public/llms.txt`. For Hugo or Astro, drop it in `static/llms.txt`. For a static-HTML deployment, drop it in the same folder as `index.html`. The file is plain text - no build step or rendering required.
Why Every Business Needs One
The blunt commercial argument: AI search surfaces are a meaningful share of branded and unbranded query traffic in 2026, and that share is growing. ChatGPT processes more than a billion queries per week. Perplexity is approaching 100 million weekly users. Gemini sits inside Google's largest product. When a buyer asks any of these systems a question that overlaps with your business, you are either cited by name with a clickable source link, or you are not.
In our audits, the pattern is mechanical. Sites with a well-formed `llms.txt` are cited at materially higher rates than otherwise comparable sites without one - even when content quality, domain authority, and schema markup are held constant. The reason is structural rather than content-based. Models prioritize sources where attribution is unambiguous. A page with `llms.txt` plus correct Article and Organization schema is a high-confidence source. A page without it is a guess.
The cost of compliance is small. A typical `llms.txt` takes 60 to 90 minutes to draft properly, requires no engineering deployment, and incurs essentially zero hosting cost. The cost of non-compliance is invisibility on a fast-growing distribution channel.
We currently maintain `llms.txt` files for clients across luxury real estate (Curated Luxury Homes, Caryl Berenato, MT Lux, OwnRVA, Rick Janson, Antola Properties, JLR Earnest Companies, Kameesh Roper Realty, Kink Team, Ke Team Hawaii, Workman Success, Ownapieceofbrooklyn), professional services (Real Profit Advisors, Innovasaleslab, Bravado Digital), AI and SaaS (Attractify, Reninc, Agent Attraction, VAAI), and our own marketing sites (10xSearch, AI Search Insider, LuxExclusives). The format scales. The lift per site is small. The downside of skipping it is large and growing.
Common Mistakes We See
Five failure modes account for the majority of bad `llms.txt` files we encounter when auditing competitor sites:
Treating it as a sitemap dump. Some sites publish an `llms.txt` that is just a list of every URL on the site. Models do not use it that way. They want a curated, prioritized brief - a sitemap of meaning, not URLs. List the 10 to 25 pages that matter, not the 600 you have.
Generic, non-specific descriptions. "Leading provider of innovative solutions" is the kind of phrase models will ignore or replace with something more specific they extract from your HTML. The blockquote must contain proper nouns: real cities, real names, real numbers, real designations.
Outdated content. Phone numbers, addresses, and team rosters change. A `llms.txt` with stale facts gets cited with stale facts. Audit twice a year.
Missing the file entirely on subdomains. If you operate `app.yourdomain.com` or `docs.yourdomain.com` separately from your primary domain, each subdomain needs its own `llms.txt`. The file is per-host.
Treating it as a substitute for schema and on-page SEO. `llms.txt` complements structured data; it does not replace it. The strongest AI visibility profiles combine all three: clean Article and Organization schema in the page HTML, an accurate `llms.txt` summarizing the site, and an `llms-full.txt` with the cornerstone content concatenated. For a deeper look at the schema and crawler-blocking layer, see our [WordPress AI visibility playbook](https://10xsearch.com/blog/why-your-wordpress-site-is-invisible-to-chatgpt/).
How to Generate One in 60 Minutes
Our standard process for spinning up a new client `llms.txt`:
- Audit the existing site. Identify the home page, the 10 to 25 canonical pages worth surfacing, and the principal's bio page. Note the exact business name, principal name, address, phone, email, credentials, and affiliations.
- Draft the title. `Brand | Principal | Location`. One line. Read it back. If it sounds like a business card, it is right.
- Write the blockquote. 100 to 250 words, third person, dense with specifics. Include credentials, service area, differentiators, and one or two verifiable claims (years in business, sales volume, network membership).
- List the key pages. 10 to 25 entries, priority order, each with a concise description after the colon. Test every URL.
- Add Service Areas, Topics, and About. Pull this directly from existing About and Service pages.
- Validate. Check the file at `yourdomain.com/llms.txt` from an incognito browser, a `curl` request without browser headers, and the [llmstxt.org validator](https://llmstxt.org/) if you want a sanity check on structure.
Then leave it alone for 60 days, watch your AI citation surface (Perplexity Pro and ChatGPT Search both attribute sources clearly), and edit only if something material changes - a new service line, a new principal, a new geography, a new credential.
Frequently Asked Questions
Is `llms.txt` an official standard? Not yet. It is a proposed convention from llmstxt.org that has accumulated meaningful real-world adoption. Anthropic, OpenAI, Perplexity, and Google's Gemini team have all referenced or implemented support in some form. We expect a formal IETF or W3C track within 12 to 24 months but the de facto standard is already operational.
Does `llms.txt` affect Google rankings? Indirectly. Google's traditional Search ranking algorithm does not use `llms.txt` as a signal. Google's Gemini and AI Overviews layer almost certainly does, based on observed citation behavior. Treating `llms.txt` as a dual-purpose investment - Gemini visibility plus citations across non-Google AI surfaces - is the right framing.
Should we publish `llms-full.txt` too? For documentation-heavy sites, B2B SaaS, and content-rich knowledge bases - yes. For a small-business marketing site, the marginal benefit of `llms-full.txt` is smaller and it can wait. Start with `llms.txt`, ship it, then add `llms-full.txt` on the next pass.
Will `llms.txt` get scraped and used for training? The file is plain HTTP-accessible and yes, it can be ingested by anyone who fetches it, including model training pipelines. The point is for that ingestion to happen with accurate, curated information you authored - not with whatever the crawler infers from your HTML noise. The file is opt-in attribution, not opt-out training control. Training control is what `User-agent: GPTBot` directives in `robots.txt` are for.
How often should we update it? Every 90 to 180 days for an established business. Immediately after any change to NAP data, principal team, service offering, or major page architecture.
Can a model lie about us if we publish a false `llms.txt`? Models will reproduce what they retrieve. If your file contains inaccurate claims, those claims will end up in AI answer surfaces, attributed to your brand, with a citation back to your domain. Accuracy is non-negotiable.
What to Do Next
If you do not have an `llms.txt` file deployed, the highest-impact 90-minute investment in your AI search visibility is to build one. The format is simple. The deployment is trivial. The downside of skipping it is sitting outside the citation graph as a meaningful share of search traffic shifts onto AI-mediated surfaces.
If you have one already, audit it: confirm every URL still resolves, confirm every fact in the blockquote is current, confirm the file is reachable without authentication or redirects. Then add `llms-full.txt` if your content footprint warrants it.
10xSearch ships `llms.txt` as part of every onboarding engagement, and we maintain the files for clients on retainer. If you want yours built, audited, or refreshed, [start a conversation here](https://10xsearch.com/contact/).
The 10xSearch editorial team builds search-visibility infrastructure for high-stakes businesses. We publish playbooks based on the audits and engineered assets we ship every week.