Welcome to llms.txt Generator

A production-grade web application that automatically generates and maintains llms.txt files for websites by analyzing their structure and content. Built for optimal LLM understanding with real-time WebSocket streaming, scheduled updates, and enterprise-scale infrastructure.

Live demo available at llmstxt.vercel.app

What is llms.txt?

The llms.txt file is a standardized format (from llmstxt.org) that helps Large Language Models better understand and index website content. Think of it as a sitemap, but optimized for AI consumption. This tool automatically:

Crawls your website using intelligent BFS traversal
Extracts structured content from each page
Generates a properly formatted llms.txt file
Hosts it on a CDN with scheduled auto-updates
Keeps it synchronized with your website changes

Key Features

Intelligent Crawling

BFS traversal with configurable depth and page limits. Handles both static and JavaScript-rendered content via Playwright.

Real-time Streaming

WebSocket-based progress updates and logs. Watch your crawl happen in real-time with detailed status messages.

Automated Updates

Schedule periodic recrawls via AWS Lambda + EventBridge. Your llms.txt stays synchronized with website changes automatically.

LLM Enhancement

Optional AI-powered content optimization using Grok 4.1-Fast to improve descriptions and structure.

Persistent Storage

R2 object storage with public CDN URLs. Your generated files are hosted and accessible worldwide.

Spec Compliance

Adheres to official llmstxt.org specification with proper markdown formatting and structure.

Architecture Overview

The system consists of three main components working together:

Frontend Layer

Next.js 15 with TypeScript
Real-time WebSocket client
Tailwind CSS for styling
Deployed on Vercel

Backend Layer

FastAPI (Python 3.11) WebSocket API
Playwright for browser automation
BeautifulSoup4 for HTML parsing
Deployed on AWS ECS Fargate

Infrastructure Layer

Supabase - PostgreSQL database
Cloudflare R2 - Object storage & CDN
AWS Lambda - Scheduled recrawl tasks
Brightdata - Proxy for JS-heavy sites

How It Works

Submit URL

User enters a website URL in the web interface and configures crawl parameters (max pages, description length, etc.)

Intelligent Crawling

The crawler performs BFS traversal, extracting titles, descriptions, and content from each page. Optionally uses Brightdata proxy for JavaScript-rendered content.

Content Formatting

Pages are formatted according to the llmstxt.org specification with proper markdown structure, hierarchical headings, and blockquote excerpts.

LLM Enhancement (Optional)

If enabled, the content is processed through Grok 4.1-Fast to optimize descriptions and improve structure.

Storage & Hosting

The generated llms.txt file is uploaded to Cloudflare R2 and a public CDN URL is returned.

Scheduled Updates (Optional)

If auto-update is enabled, the site is enrolled in the recrawl queue. AWS Lambda checks for updates every 6 hours and regenerates the file when needed.

Use Cases

Documentation Sites

Make your docs easily discoverable and understandable by AI assistants and chatbots.

E-commerce

Help LLMs understand your product catalog, improving recommendations and search.

Blogs & Publications

Index your articles and content for better AI-powered content discovery.

Corporate Websites

Make your services, products, and information accessible to AI tools.

Tech Stack

Backend
Frontend
Infrastructure

FastAPI - Modern async Python web framework
Playwright - Browser automation
BeautifulSoup4 - HTML parsing
Supabase - PostgreSQL database
Cloudflare R2 - Object storage
Brightdata - Proxy for JS-heavy sites

What’s Next?

Quick Start

Get started in 5 minutes with local development

API Reference

Explore the WebSocket API and endpoints

Deployment Guide

Deploy to AWS with Terraform

Configuration

Configure environment variables and settings

Community & Support

Open Source

This project is open source and available on GitHub. Contributions, issues, and feature requests are welcome!

This tool crawls websites and may generate significant traffic. Always respect robots.txt and rate limits. Use the Brightdata proxy feature for production crawling.

Get Started

Core Features

Guides

Deployment

Introduction

Welcome to llms.txt Generator

What is llms.txt?

Key Features

Intelligent Crawling

Real-time Streaming

Automated Updates

LLM Enhancement

Persistent Storage

Spec Compliance

Architecture Overview

How It Works

Use Cases

Documentation Sites

E-commerce

Blogs & Publications

Corporate Websites

Tech Stack

What’s Next?

Quick Start

API Reference

Deployment Guide

Configuration

Community & Support

Open Source

Build docs developers (and LLMs) love

Get Started

Core Features

Guides

Deployment

​Welcome to llms.txt Generator

​What is llms.txt?

​Key Features

Intelligent Crawling

Real-time Streaming

Automated Updates

LLM Enhancement

Persistent Storage

Spec Compliance

​Architecture Overview

​How It Works

​Use Cases

Documentation Sites

E-commerce

Blogs & Publications

Corporate Websites

​Tech Stack

​What’s Next?

Quick Start

API Reference

Deployment Guide

Configuration

​Community & Support

Open Source

Build docs developers (and LLMs) love

Welcome to llms.txt Generator

What is llms.txt?

Key Features

Architecture Overview

How It Works

Use Cases

Tech Stack

What’s Next?

Community & Support