What is llms.txt?
llms.txt is a lightweight, human-readable format for describing website content in a way that’s optimized for LLM understanding. It’s similar in spirit to robots.txt or sitemap.xml, but designed specifically for AI consumption. Learn more at llmstxt.orgFormat Overview
An llms.txt file is a Markdown document with a specific structure:Structure
Specification Components
1. Header Section
Extract site title
The generator uses the homepage If the title is generic (“Home”, “Welcome”), it derives a title from the domain name.
<title> tag as the site name.formatter.py:75-84
2. Section Organization
Content is organized into sections based on URL structure:Section Example
Extract sections from URLs
Sections are derived from the first path segment:Examples:
formatter.py:124-128
https://example.com/docs/intro→ “docs” sectionhttps://example.com/api/users→ “api” sectionhttps://example.com/about→ “about” section
Clean section names
Section names are formatted for readability:Transformations:
formatter.py:90-103
api-reference→ “API Reference”getting_started→ “Getting Started”faq→ “FAQ”
3. Page Entries
Each page is represented as a list item with:- Title (linked to URL)
- Description
- Optional tags
Page Entry Format
Format URLs
URLs are cleaned and prefer Markdown versions when available:
formatter.py:16-31
The generator checks if
.md versions exist via HEAD requests and prefers them for better LLM parsing.Extract and truncate descriptions
Descriptions are extracted from page metadata and truncated:Default truncation:
formatter.py:69-73
- Section descriptions: 150 characters
- Site summary: 200 characters
4. Optional Section
Secondary pages are grouped at the end without descriptions:Optional Section
formatter.py:146-173
Complete Example
- Generated Output
- HTML Source
llms.txt
Specification Compliance
The generator adheres to the official llmstxt.org specification:✅ Markdown Format
✅ Markdown Format
All output is valid Markdown that can be parsed by standard Markdown processors.
- Uses standard heading syntax (
#,##) - Uses standard link syntax (
[text](url)) - Uses standard list syntax (
-) - Uses standard blockquote syntax (
>)
✅ Hierarchical Structure
✅ Hierarchical Structure
Content is organized in a clear hierarchy:
- Site title (H1)
- Site description (blockquote)
- Sections (H2)
- Pages (list items)
✅ Semantic Organization
✅ Semantic Organization
Pages are grouped logically:
- Primary content by URL structure
- Secondary content in “Optional” section
- Alphabetically sorted within sections
✅ Clean URLs
✅ Clean URLs
URLs are normalized and cleaned:
- Query parameters removed
- Fragments removed
- Prefers
.mdversions when available - Uses HTTPS when available
✅ Content Truncation
✅ Content Truncation
Descriptions are truncated at semantic boundaries:
- Truncates at word boundaries (not mid-word)
- Adds ellipsis when truncated
- Configurable length limits
✅ Metadata Enrichment
✅ Metadata Enrichment
Pages include contextual metadata:
- Content type tags (API, Guide, etc.)
- Complexity tags (Beginner, Advanced)
- Topic tags (Security, Performance, etc.)
Best Practices
Keep Descriptions Concise
Descriptions should be 100-200 characters. The generator enforces this automatically.
Use Semantic Sections
Organize content by user journey (Getting Started, Guides, API Reference) rather than technical structure.
Include Key Pages
Prioritize documentation, guides, and API references over marketing pages.
Update Regularly
Use auto-update to keep llms.txt synchronized with website changes.
Customization
While the specification is standardized, you can customize the generator’s behavior:Section Patterns
Modify secondary content detection:formatter.py:7-14
Tag Patterns
Add custom tag detection:tagger.py:4-23
Truncation Limits
Adjust description lengths:formatter.py:135-136
Validation
Validate your llms.txt file:FAQ
Why Markdown instead of JSON or XML?
Why Markdown instead of JSON or XML?
Markdown is:
- Human-readable and editable
- LLM-friendly (models train on Markdown)
- Version control friendly
- Simpler than structured formats
How is this different from sitemap.xml?
How is this different from sitemap.xml?
Sitemaps are for search engine crawlers. llms.txt is optimized for LLM understanding:
- Includes descriptions and context
- Organized by user journey
- Includes content type hints
- Filters out irrelevant pages
Can I edit the generated file?
Can I edit the generated file?
Yes! The generator provides a starting point. You can:
- Reorder sections
- Edit descriptions
- Add/remove pages
- Customize tags
Should I include all pages?
Should I include all pages?
No. Focus on content valuable to LLMs:
- Documentation and guides
- API references
- Conceptual content
- Examples and tutorials
- Marketing pages
- Legal pages (or put in Optional)
- Duplicate content
- Internal tools/admin pages
Resources
llmstxt.org
Official specification and guidelines
Example Sites
Real-world llms.txt implementations
Formatter Code
Implementation details in the codebase
Web Interface
Generate your own llms.txt file
Next Steps
Generate Your First File
Create an llms.txt file in minutes
API Usage
Integrate programmatically
Configuration
Customize the generator behavior
Development
Contribute to the project