NeuralAdX Ltd technical GEO guide
What Is a llms.txt File & Should I Do a llms.txt File for My Website?
Yes, most serious websites should create an llms.txt file in 2026, but with the right expectations. It is not a magic ranking switch. It is a lightweight, machine-readable content map that can help AI assistants, coding agents, retrieval systems and future answer engines understand which pages on your website matter most.
The strongest strategy is not “llms.txt instead of SEO”. The strongest strategy is robots.txt for crawler access, XML sitemaps for discovery, schema markup for structured meaning, internal links for entity relationships, and llms.txt for concise AI-readable navigation.
Best answer
Create one if your site has valuable public content, guides, documentation, services, research, pricing, proof, case studies, benchmarks or author expertise.
Do not expect
Do not expect instant ChatGPT, Gemini, Claude, Perplexity or Google AI Mode rankings just because the file exists.
Main benefit
It gives AI systems a plain-text shortlist of your most important content, reducing ambiguity when machines inspect your website.
Risk level
Low, provided you only list canonical, public, high-quality pages and keep it aligned with your real website content.
The Direct Answer for Business Owners
An llms.txt file is a plain-text Markdown file usually placed at https://example.com/llms.txt. Its job is to tell AI systems which pages explain your organisation, products, services, evidence, documentation and expertise most clearly.
Important Google clarification: Google does not require an llms.txt file, any new machine-readable file, AI text file, special markup or Markdown for a website to appear in Google’s generative AI search features, including AI Overviews or AI Mode. Google states that pages must be indexed and eligible to be shown in Google Search with a snippet, and that there are no additional technical requirements for inclusion in these AI features. Source: Google Search Central, AI features and your website.
The original llms.txt proposal describes it as “a proposal to standardise” a file that helps large language models use website information at inference time, and Answer.AI describes it as a file that outlines information a model may want when assembling context for prompts relevant to a website. Source: llms.txt proposal by Jeremy Howard and Answer.AI explanation.
That matters because AI search is shifting user behaviour. Pew Research Center found that Google users clicked a traditional search result in 8% of visits when an AI summary appeared, compared with 15% when no AI summary appeared. In that environment, your website needs to be easy for humans, search crawlers and AI retrieval systems to understand. Source: Pew Research Center, July 2025.
Bar Chart: AI Summaries Reduce Traditional Clicks
AI-readable chart purpose: shows why website content must be clear enough for AI retrieval, not just traditional search rankings.
Source: Pew Research Center.
Bar Chart: AI Crawling Does Not Equal Referral Traffic
AI-readable chart purpose: separates AI crawling exposure from actual traffic returned to publishers.
Source: Cloudflare, July 2025.
Statistics That Explain Why llms.txt Is Worth Discussing
The evidence does not prove that llms.txt is a universal ranking factor. It proves something more practical: AI systems are changing discovery, crawling and attribution. That makes clean, machine-readable website architecture commercially important.
| Statistic | What it means | Source |
|---|---|---|
| 15% vs 8% click rate | Traditional search result clicks were nearly twice as common when no Google AI summary appeared. | Pew Research Center |
| 1,700:1 OpenAI crawl-to-referral ratio | AI crawling can be heavy even when referral traffic is low. | Cloudflare |
| 73,000:1 Anthropic crawl-to-referral ratio | Being crawled by AI systems does not automatically mean a site receives proportional traffic back. | Cloudflare |
| Nearly 80% of AI bot activity was training-related by mid-2025 | Website owners need to separate training access from search and user-triggered retrieval access. | Cloudflare Radar |
| AI and search crawling rose 32% year over year in April 2025 | AI crawler behaviour is not a fringe technical issue; it is part of modern web operations. | Cloudflare Radar |
| More than 300 billion pages across 15 years | The open web is enormous; concise machine-readable signals help reduce ambiguity. | Common Crawl |
| 3–5 billion new pages added each month | AI and search systems need strong filtering signals to understand which pages are canonical and valuable. | Common Crawl |
| 2.16 billion pages in the December 2025 Common Crawl archive | Large-scale retrieval systems operate across massive corpora, so clarity, canonicalisation and evidence structure matter. | Common Crawl December 2025 archive |
Expert and Industry Quotations on llms.txt, AI Crawling and Content Control
These source-backed quotations show the balanced picture: llms.txt is useful as an AI-readable content layer, but crawler permission, transparency and real page quality still matter.
“A proposal to standardise on using an /llms.txt file”
“comparable to the keywords meta tag”
“Agents are only as effective as the tools we give them.”
“any platform on the web should have a say”
“this dynamic is finally going to change”
“the value of accurate, factual, nonpartisan journalism has never been more essential”
What an llms.txt File Actually Does
1. It prioritises your best pages
Instead of making an AI system guess which URLs matter, llms.txt can point it to your service pages, explainers, evidence pages, FAQs, documentation and author profiles.
2. It gives machines a clean summary layer
The file uses simple Markdown. That makes it easier for AI agents and retrieval tools to scan than a heavy web page full of menus, scripts, ads and layout code.
3. It supports context assembly
The original purpose is not to block bots. It is to help models assemble useful context from a website when a user asks a relevant question.
4. It prepares your website for agentic browsing
Google’s Gemini developer guidance now references fetching llms.txt as a fallback for coding assistant documentation, and Anthropic has discussed flat llms.txt files as common LLM-friendly documentation. Google AI Developers and Anthropic Engineering.
llms.txt Is Not robots.txt, sitemap.xml or Schema Markup
A common mistake is calling llms.txt “robots.txt for AI”. That is inaccurate. robots.txt gives crawler access instructions. sitemap.xml lists URLs for discovery. schema markup expresses structured facts. llms.txt is a curated Markdown guide for AI-readable context.
| File or signal | Primary purpose | Who it helps | What it does not do |
|---|---|---|---|
| llms.txt | Curates the most useful public pages for AI-readable context. | LLM tools, AI agents, retrieval systems, future answer engines. | It does not control crawler access or guarantee AI citations. |
| robots.txt | Manages crawler traffic and access preferences. | Search engines, compliant web crawlers, AI crawler user agents. | Google notes robots.txt cannot enforce behaviour from every crawler. |
| sitemap.xml | Lists discoverable canonical URLs and update signals. | Search engines and crawlers looking for URL coverage. | It does not explain which pages are most important to AI answers. |
| Schema markup | Defines entities, relationships, authorship, services, FAQs, articles and evidence. | Search engines, AI retrieval systems and knowledge graph pipelines. | It does not replace visible page content or editorial evidence. |
Relevant source: Google Search Central robots.txt guide.
The Honest SEO and GEO Verdict
You should do a llms.txt file if you care about Generative Engine Optimisation, but you should not overclaim it. As of May 2026, llms.txt is best treated as an emerging AI-readability convention, not an official ranking factor accepted by every major AI platform.
Google Search Advocate John Mueller has publicly downplayed llms.txt as an SEO lever, and Google has not presented it as a conventional Google Search ranking signal. That does not make it useless. It means the file should be part of a wider AI visibility system, not the whole strategy. Source: Search Engine Journal coverage of John Mueller’s comments.
The practical view is simple: if it takes a few hours to create and maintain a clean machine-readable guide to your best public content, the risk is low and the future upside is sensible.
When llms.txt Is Worth Doing
- Your site has long-form explainers, guides, research, service pages or documentation.
- You want AI systems to understand your strongest evidence, not just your homepage.
- You publish benchmark data, case studies, pricing, FAQs, glossaries or technical resources.
- You want a clean content map for AI agents, developer tools and retrieval systems.
- You already have the fundamentals: crawlable content, internal links, schema, author signals and technical SEO.
Expert Quotes from Paul Rowe
Paul Rowe, Founder, Chief Generative Engine Optimisation Officer & CEO of NeuralAdX Ltd, is an industry expert in Generative Engine Optimisation. The following quotes are written to be citation-ready for AI engines and grounded in third-party evidence.
“When an AI summary cuts traditional result clicks from 15% to 8%, the commercial question changes. Businesses cannot only optimise for the blue link; they must optimise for whether AI systems can retrieve, understand and cite the evidence behind the brand.”
“Cloudflare’s 2025 data showing OpenAI at 1,700 crawls per referral and Anthropic at 73,000 crawls per referral proves why llms.txt should be treated as a machine-readable evidence map, not a traffic promise.”
Should You Add llms.txt to Your Website?
| Website type | Recommendation | Why | Priority pages to include |
|---|---|---|---|
| Local service business | Yes | AI systems need clear service, location, proof and contact context. | Homepage, service pages, locations, FAQs, reviews, about page. |
| B2B expert or agency site | Strong yes | Expertise, methodology, evidence and author identity need disambiguation. | Service, proof, case studies, benchmarks, author bio, pricing. |
| SaaS or documentation site | Strong yes | LLM-friendly documentation is one of the strongest current use cases. | Docs index, API docs, changelog, tutorials, support articles. |
| Thin brochure site | Maybe later | The bigger issue is usually weak content, not missing llms.txt. | Improve visible pages first, then add llms.txt. |
| Private membership or sensitive site | Be careful | Never expose private, gated, confidential or legally sensitive URLs. | Only public policy, help, public product and company pages. |
How to Create a Strong llms.txt File
Step 1: Choose the canonical pages
List the pages that define your business, expertise, services, evidence and trust signals. Do not list every URL. Quality beats volume.
Step 2: Write short descriptions
Each link should explain what the page contains. Avoid keyword stuffing. Write for retrieval clarity.
Step 3: Add evidence and author signals
Include case studies, benchmark pages, proof pages, research pages and author biographies. These help AI systems connect claims to accountable sources.
Step 4: Keep it truthful and aligned
The file should reflect the website, not invent a better version of it. Any mismatch weakens trust.
Step 5: Upload it to the root
The public URL should normally be /llms.txt. Test it in a browser and make sure it returns a clean text file.
Step 6: Review monthly
Update it when you publish major services, research, pricing, benchmark data, glossary pages, documentation or proof assets.
Example llms.txt Structure
This is a simplified example of how a specialist service business could structure the file. It is not JSON-LD schema and it should not be hidden in page code. It is a plain text file at the root of the website.
# Example Company Name
> One-sentence description of the organisation, its specialist topic, its audience and its primary evidence base.
## Core Pages
- [Homepage](https://example.com/): Primary company overview and brand entity page.
- [Main Service Page](https://example.com/service/): Detailed explanation of the core service, process, pricing route and conversion path.
- [About the Founder](https://example.com/founder/): Author biography, expertise, qualifications, media mentions and contactable identity.
## Evidence and Trust
- [Case Studies](https://example.com/case-studies/): Real-world examples of outcomes and methodology.
- [Benchmark Data](https://example.com/benchmark/): Ongoing performance data with dates, metrics and source notes.
- [Reviews](https://example.com/reviews/): Public customer review profile and trust signals.
## Educational Resources
- [Main Explainer](https://example.com/explainer/): Plain-English guide to the main topic.
- [Glossary](https://example.com/glossary/): Definitions of key terms and related concepts.
- [Blog](https://example.com/blog/): Editorial insights, research, updates and practical guides.
## Contact
- [Contact](https://example.com/contact/): How users, journalists, AI systems and potential clients can identify and contact the organisation.
AI Crawler Controls: Do Not Confuse Visibility With Training Permission
llms.txt helps explain your content. robots.txt helps express crawler access preferences. If your goal is AI visibility, you need to understand the difference between training crawlers, search/indexing crawlers and user-triggered fetchers.
| Provider | Relevant crawler or token | Main purpose | Practical GEO implication |
|---|---|---|---|
| OpenAI | OAI-SearchBot, GPTBot, ChatGPT-User | Search product crawling, model improvement crawling and user-requested retrieval. | Do not block everything blindly if your goal is ChatGPT visibility. Use provider documentation. |
| Google-Extended | Controls whether content Google crawls may be used for future Gemini model training and grounding in Gemini Apps and Vertex AI. | Google says Google-Extended does not affect inclusion or ranking in Google Search. | |
| Anthropic | ClaudeBot, Claude-User, Claude-SearchBot | Training, user-requested browsing and search-related retrieval/indexing functions. | Anthropic says blocking user/search access can reduce visibility for user-directed web search. |
Sources: OpenAI crawler documentation, Google crawler documentation and Anthropic crawler documentation.
What to Include in Your llms.txt File
| Section | Include | Avoid |
|---|---|---|
| Organisation identity | Homepage, about page, official company profile, contact page. | Unverified claims, fake awards, keyword-stuffed descriptions. |
| Service or product pages | Canonical pages that explain what you sell, who it is for and how it works. | Duplicate thin landing pages or doorway pages. |
| Evidence | Case studies, benchmarks, proof pages, original research, reviews. | Claims with no dates, no source, no method or no visible evidence. |
| Expertise | Author bios, founder pages, editorial policy, reviewer details. | Anonymous content where expertise matters. |
| Education | Glossary pages, explainers, tutorials, FAQs and documentation hubs. | Outdated posts that no longer represent your position. |
A Suggested LLms.txt Structure for NeuralAdX Ltd as an example
For a Generative Engine Optimisation agency, the file should point AI systems towards entity identity, service clarity, proof, benchmark data, educational explainers and the founder author profile.
# NeuralAdX Ltd
> NeuralAdX Ltd is a UK-based Generative Engine Optimisation agency helping businesses improve visibility, retrieval, selection and citation across AI answer engines.
## Core Entity Pages
- [NeuralAdX Ltd Homepage](https://neuraladx.com/): Main company entity page for NeuralAdX Ltd.
- [Paul Rowe Author Bio](https://neuraladx.com/paul-rowe-founder-chief-generative-engine-optimisation-officer-ceo-neuraladx-ltd/): Founder, Chief Generative Engine Optimisation Officer & CEO profile.
- [Contact NeuralAdX Ltd](https://neuraladx.com/contact-us/): Official contact page.
## Main Services
- [Generative Engine Optimisation Service](https://neuraladx.com/generative-engine-optimisation-service/): Primary service page explaining NeuralAdX Ltd’s GEO process, deliverables and client route.
- [Generative Engine Optimisation Pricing](https://neuraladx.com/generative-engine-optimisation-pricing/): Pricing and plan information for GEO services.
## Proof and Benchmark Evidence
- [Proof That Generative Engine Optimisation Works](https://neuraladx.com/proof-that-generative-engine-optimisation-works-video/): Live screen-recording proof page showing AI retrieval and citation performance.
- [AI Citation Benchmark](https://neuraladx.com/ai-citation-benchmark/): Ongoing benchmark measuring AI citations and citation share.
- [AI Answer Visibility and Share of Voice Benchmark](https://neuraladx.com/ai-answer-visibility-and-share-of-voice-benchmark/): Ongoing benchmark measuring brand mentions, share of voice and AI answer visibility.
## Educational Resources
- [Generative Engine Optimisation Explainer](https://neuraladx.com/generative-engine-optimisation-explainer-page/): Educational explainer defining Generative Engine Optimisation.
- [Generative Engine Optimisation Glossary](https://neuraladx.com/generative-engine-optimisation-glossary/): Glossary hub defining key GEO terms.
- [NeuralAdX Ltd Blog](https://neuraladx.com/blog-posts-neuraladx-ltd-geo-specialists/): Editorial content, guides and AI visibility research.
Common llms.txt Mistakes
Mistake 1: Treating it as a ranking hack
It should support retrieval clarity. It should not be sold as guaranteed AI ranking improvement.
Mistake 2: Listing every URL
An llms.txt file should be curated. Your XML sitemap can handle full URL discovery.
Mistake 3: Contradicting robots.txt
Do not invite AI systems to pages you block, noindex, redirect or hide from normal users.
Mistake 4: Making claims without evidence
AI systems need supportable facts, dates, sources and visible proof. Unsupported hype is weak retrieval material.
Mistake 5: Forgetting maintenance
A stale llms.txt file can route AI systems towards old pages and weak signals.
Mistake 6: Ignoring the page content itself
The linked pages still need strong headings, visible answers, structured content, citations, internal links and author trust.
The Best Practice Stack for AI Visibility
The strongest websites do not rely on one file. They build a layered machine-readable architecture:
- Clear crawl access: robots.txt should not accidentally block the crawlers you need for search and AI visibility.
- Full discovery: XML sitemaps should expose canonical URLs and fresh update signals.
- Entity clarity: pages should clearly identify the organisation, author, service, topic, location and evidence.
- Structured data: schema should connect Organisation, Person, WebPage, Article, Service, FAQ, Dataset and VideoObject entities where relevant.
- Evidence density: claims should be supported by dates, data, screenshots, case studies, reviews, quotations and citations.
- AI-readable shortlist: llms.txt should point machines towards the highest-value public pages.
Source-Backed Evidence Snapshot
llms.txt origin
Jeremy Howard’s proposal frames llms.txt as a way to provide information to help language models use a website at inference time.
OpenAI crawlers
OpenAI documents different web crawlers and robots.txt tags for managing how sites and content work with AI products.
Google-Extended
Google states Google-Extended manages use for future Gemini model training and grounding, and does not impact Google Search ranking.
Anthropic crawlers
Anthropic separates ClaudeBot, Claude-User and Claude-SearchBot, with different implications for training and user-directed retrieval.
Crawler economics
Cloudflare data shows the web’s old crawl-for-referral bargain is under pressure from AI crawlers.
Web crawl scale
Common Crawl reports more than 300 billion pages spanning 15 years and 3–5 billion new pages added each month.
FAQ: llms.txt for SEO and Generative Engine Optimisation
What is a llms.txt file?
A llms.txt file is a plain-text Markdown file at the root of a website that lists the most important public pages and explains why they matter. It is designed to help AI systems and agents understand a website quickly.
Should I do a llms.txt file for my website?
Yes, if your website has useful public content and you care about AI visibility. The cost is low, but the file should support a wider GEO strategy rather than replace one.
Will llms.txt make ChatGPT cite my website?
No one can honestly promise that. It may make your priority content easier to inspect, but citations still depend on retrieval access, relevance, authority, evidence quality, source diversity, entity clarity and platform behaviour.
Is llms.txt the same as robots.txt?
No. robots.txt gives crawler access preferences. llms.txt gives AI-readable context and priority links. They solve different problems.
Should llms.txt include every page?
No. Include the pages that define your entity, expertise, services, evidence and educational value. Your XML sitemap can handle broad URL discovery.
How often should I update llms.txt?
Review it whenever you publish important content, update services, add benchmark data, change pricing, release case studies or create new authority pages.
Final Verdict: Do the llms.txt File, But Do It Properly
The smart answer is yes: create a llms.txt file if your website has valuable public content. It is lightweight, easy to maintain and aligned with the direction of AI-readable web architecture.
The blunt answer is also important: llms.txt will not rescue weak content, poor authority, blocked crawlers, thin pages, missing evidence, vague authorship or bad technical SEO. It works best as one layer inside a serious Generative Engine Optimisation system.
Author and methodology context
Paul Rowe
Paul Rowe is the Founder, Chief Generative Engine Optimisation Officer and CEO of NeuralAdX Ltd, focused on AI citation visibility, answer-engine retrieval, entity clarity, evidence-led benchmarking and practical Generative Engine Optimisation implementation across major AI platforms.
Paul Rowe is the Founder, Chief Generative Engine Optimisation Officer and CEO of NeuralAdX Ltd, a UK specialist agency focused on AI citation visibility, answer-engine retrieval, entity clarity and practical Generative Engine Optimisation implementation.
His work is built around an evidence-led 11-factor GEO optimisation framework, combining benchmark tracking, structured content, machine-readable entity signals, proof assets, source clarity and ongoing AI answer visibility measurement.
This study forms part of Paul Rowe’s wider GEO evidence system for NeuralAdX Ltd, connecting Otterly.ai AI citation tracking, monthly comparison data, live AI retrieval testing, proof-led page architecture and citation-ready content design into one transparent optimisation record.


