Skip to main content

Welcome to llms.txt Generator

A production-grade web application that automatically generates and maintains llms.txt files for websites by analyzing their structure and content. Built for optimal LLM understanding with real-time WebSocket streaming, scheduled updates, and enterprise-scale infrastructure.
Live demo available at llmstxt.vercel.app

What is llms.txt?

The llms.txt file is a standardized format (from llmstxt.org) that helps Large Language Models better understand and index website content. Think of it as a sitemap, but optimized for AI consumption. This tool automatically:
  • Crawls your website using intelligent BFS traversal
  • Extracts structured content from each page
  • Generates a properly formatted llms.txt file
  • Hosts it on a CDN with scheduled auto-updates
  • Keeps it synchronized with your website changes

Key Features

Intelligent Crawling

BFS traversal with configurable depth and page limits. Handles both static and JavaScript-rendered content via Playwright.

Real-time Streaming

WebSocket-based progress updates and logs. Watch your crawl happen in real-time with detailed status messages.

Automated Updates

Schedule periodic recrawls via AWS Lambda + EventBridge. Your llms.txt stays synchronized with website changes automatically.

LLM Enhancement

Optional AI-powered content optimization using Grok 4.1-Fast to improve descriptions and structure.

Persistent Storage

R2 object storage with public CDN URLs. Your generated files are hosted and accessible worldwide.

Spec Compliance

Adheres to official llmstxt.org specification with proper markdown formatting and structure.

Architecture Overview

The system consists of three main components working together:
  • Next.js 15 with TypeScript
  • Real-time WebSocket client
  • Tailwind CSS for styling
  • Deployed on Vercel
  • FastAPI (Python 3.11) WebSocket API
  • Playwright for browser automation
  • BeautifulSoup4 for HTML parsing
  • Deployed on AWS ECS Fargate
  • Supabase - PostgreSQL database
  • Cloudflare R2 - Object storage & CDN
  • AWS Lambda - Scheduled recrawl tasks
  • Brightdata - Proxy for JS-heavy sites

How It Works

1

Submit URL

User enters a website URL in the web interface and configures crawl parameters (max pages, description length, etc.)
2

Intelligent Crawling

The crawler performs BFS traversal, extracting titles, descriptions, and content from each page. Optionally uses Brightdata proxy for JavaScript-rendered content.
3

Content Formatting

Pages are formatted according to the llmstxt.org specification with proper markdown structure, hierarchical headings, and blockquote excerpts.
4

LLM Enhancement (Optional)

If enabled, the content is processed through Grok 4.1-Fast to optimize descriptions and improve structure.
5

Storage & Hosting

The generated llms.txt file is uploaded to Cloudflare R2 and a public CDN URL is returned.
6

Scheduled Updates (Optional)

If auto-update is enabled, the site is enrolled in the recrawl queue. AWS Lambda checks for updates every 6 hours and regenerates the file when needed.

Use Cases

Documentation Sites

Make your docs easily discoverable and understandable by AI assistants and chatbots.

E-commerce

Help LLMs understand your product catalog, improving recommendations and search.

Blogs & Publications

Index your articles and content for better AI-powered content discovery.

Corporate Websites

Make your services, products, and information accessible to AI tools.

Tech Stack

  • FastAPI - Modern async Python web framework
  • Playwright - Browser automation
  • BeautifulSoup4 - HTML parsing
  • Supabase - PostgreSQL database
  • Cloudflare R2 - Object storage
  • Brightdata - Proxy for JS-heavy sites

What’s Next?

Quick Start

Get started in 5 minutes with local development

API Reference

Explore the WebSocket API and endpoints

Deployment Guide

Deploy to AWS with Terraform

Configuration

Configure environment variables and settings

Community & Support

Open Source

This project is open source and available on GitHub. Contributions, issues, and feature requests are welcome!
This tool crawls websites and may generate significant traffic. Always respect robots.txt and rate limits. Use the Brightdata proxy feature for production crawling.

Build docs developers (and LLMs) love