Welcome to llms.txt Generator
A production-grade web application that automatically generates and maintainsllms.txt files for websites by analyzing their structure and content. Built for optimal LLM understanding with real-time WebSocket streaming, scheduled updates, and enterprise-scale infrastructure.
Live demo available at llmstxt.vercel.app
What is llms.txt?
Thellms.txt file is a standardized format (from llmstxt.org) that helps Large Language Models better understand and index website content. Think of it as a sitemap, but optimized for AI consumption.
This tool automatically:
- Crawls your website using intelligent BFS traversal
- Extracts structured content from each page
- Generates a properly formatted
llms.txtfile - Hosts it on a CDN with scheduled auto-updates
- Keeps it synchronized with your website changes
Key Features
Intelligent Crawling
BFS traversal with configurable depth and page limits. Handles both static and JavaScript-rendered content via Playwright.
Real-time Streaming
WebSocket-based progress updates and logs. Watch your crawl happen in real-time with detailed status messages.
Automated Updates
Schedule periodic recrawls via AWS Lambda + EventBridge. Your llms.txt stays synchronized with website changes automatically.
LLM Enhancement
Optional AI-powered content optimization using Grok 4.1-Fast to improve descriptions and structure.
Persistent Storage
R2 object storage with public CDN URLs. Your generated files are hosted and accessible worldwide.
Spec Compliance
Adheres to official llmstxt.org specification with proper markdown formatting and structure.
Architecture Overview
The system consists of three main components working together:Frontend Layer
Frontend Layer
- Next.js 15 with TypeScript
- Real-time WebSocket client
- Tailwind CSS for styling
- Deployed on Vercel
Backend Layer
Backend Layer
- FastAPI (Python 3.11) WebSocket API
- Playwright for browser automation
- BeautifulSoup4 for HTML parsing
- Deployed on AWS ECS Fargate
Infrastructure Layer
Infrastructure Layer
- Supabase - PostgreSQL database
- Cloudflare R2 - Object storage & CDN
- AWS Lambda - Scheduled recrawl tasks
- Brightdata - Proxy for JS-heavy sites
How It Works
Submit URL
User enters a website URL in the web interface and configures crawl parameters (max pages, description length, etc.)
Intelligent Crawling
The crawler performs BFS traversal, extracting titles, descriptions, and content from each page. Optionally uses Brightdata proxy for JavaScript-rendered content.
Content Formatting
Pages are formatted according to the llmstxt.org specification with proper markdown structure, hierarchical headings, and blockquote excerpts.
LLM Enhancement (Optional)
If enabled, the content is processed through Grok 4.1-Fast to optimize descriptions and improve structure.
Storage & Hosting
The generated
llms.txt file is uploaded to Cloudflare R2 and a public CDN URL is returned.Use Cases
Documentation Sites
Make your docs easily discoverable and understandable by AI assistants and chatbots.
E-commerce
Help LLMs understand your product catalog, improving recommendations and search.
Blogs & Publications
Index your articles and content for better AI-powered content discovery.
Corporate Websites
Make your services, products, and information accessible to AI tools.
Tech Stack
- Backend
- Frontend
- Infrastructure
- FastAPI - Modern async Python web framework
- Playwright - Browser automation
- BeautifulSoup4 - HTML parsing
- Supabase - PostgreSQL database
- Cloudflare R2 - Object storage
- Brightdata - Proxy for JS-heavy sites
What’s Next?
Quick Start
Get started in 5 minutes with local development
API Reference
Explore the WebSocket API and endpoints
Deployment Guide
Deploy to AWS with Terraform
Configuration
Configure environment variables and settings
Community & Support
Open Source
This project is open source and available on GitHub. Contributions, issues, and feature requests are welcome!