Quick Start Guide

Get the llms.txt Generator running locally in under 5 minutes. This guide will walk you through setting up both the backend and frontend for development.

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.11+

Required for the FastAPI backend

Node.js 20+

Required for the Next.js frontend

Git

For cloning the repository

Docker is optional but recommended for simplified deployment. See the Docker Setup section below.

Installation

Clone the Repository

Clone the project to your local machine:

git clone <your-repo-url>
cd llmstxt

Backend Setup

Set up the Python environment and install dependencies:

cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

The requirements.txt includes:

fastapi
uvicorn[standard]
httpx
beautifulsoup4
pydantic
pydantic-settings
python-dotenv
boto3
pytest
pytest-asyncio
supabase==2.10.0
playwright
PyJWT[crypto]

Install Playwright browsers:

playwright install chromium

Backend Environment Configuration

Create your backend environment file:

cp .env.example .env

Edit .env with your configuration:

# CORS Configuration
CORS_ORIGINS=http://localhost:3000

# Cloudflare R2 Storage (Required)
R2_ENDPOINT=https://your-account-id.r2.cloudflarestorage.com
R2_ACCESS_KEY=your-access-key
R2_SECRET_KEY=your-secret-key
R2_BUCKET=llms-txt
R2_PUBLIC_DOMAIN=https://your-public-domain.com

# Supabase Database (Required)
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-anon-key

# Cron Secret for scheduled updates
CRON_SECRET=your-secret-token

# Brightdata Proxy (Optional - for JS-heavy sites)
BRIGHTDATA_API_KEY=your-customer-id-here
BRIGHTDATA_ENABLED=true
BRIGHTDATA_ZONE=scraping_browser1
BRIGHTDATA_PASSWORD=your-zone-password-here

At minimum, you need to configure R2 storage and Supabase. The Brightdata proxy is optional and only needed for JavaScript-heavy websites.

Frontend Setup

In a new terminal, navigate to the frontend directory:

cd frontend
npm install

Create the frontend environment file:

cp .env.example .env.local

Edit .env.local:

NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws/crawl

Start the Servers

Start both servers in separate terminals:

cd backend
source venv/bin/activate
uvicorn main:app --reload --port 8000

You should see:

Backend: INFO: Uvicorn running on http://127.0.0.1:8000
Frontend: Ready on http://localhost:3000

Access the Application

Open your browser and navigate to:

Frontend UI

http://localhost:3000

Backend API

http://localhost:8000

API Docs

http://localhost:8000/docs

Generate Your First llms.txt

Now that everything is running, let’s generate your first llms.txt file:

Open the Web Interface

Navigate to http://localhost:3000 in your browser.

Enter a Website URL

Enter a website URL you want to crawl. For testing, try:

https://docs.python.org
https://fastapi.tiangolo.com
Your own documentation site

Configure Crawl Parameters

Adjust the settings based on your needs:

Max Pages: Number of pages to crawl (default: 50)
Description Length: Character limit for page excerpts (default: 500)
Enable Auto-Update: Schedule periodic recrawls (optional)
Recrawl Interval: Minutes between updates (default: 360)
LLM Enhancement: AI-powered optimization (optional)
Use Brightdata: For JavaScript-heavy sites (optional)

Start Crawling

Click “Generate llms.txt” and watch the real-time progress in the log window.You’ll see messages like:

Starting crawl of https://example.com
Crawling page 1/50...
Crawling page 2/50...
Found 25 pages
Checking for .md versions of pages...
Found 3 pages with .md versions

Get Your Results

Once complete, you’ll receive:

Generated llms.txt content (viewable in browser)
Download button for the file
Public CDN URL for hosting
Copy button for quick sharing

Understanding the WebSocket API

The backend uses WebSockets for real-time communication. Here’s how the protocol works:

Connection

const ws = new WebSocket('ws://localhost:8000/ws/crawl?api_key=YOUR_KEY');

Send Request

{
  "url": "https://example.com",
  "maxPages": 50,
  "descLength": 500,
  "enableAutoUpdate": false,
  "recrawlIntervalMinutes": 360,
  "llmEnhance": false,
  "useBrightdata": false
}

Receive Messages

The server sends different message types:

{
  "type": "log",
  "content": "Crawling page 1/50..."
}

Implementation Example

Here’s the core WebSocket handler from the backend:

@app.websocket("/ws/crawl")
async def websocket_crawl(websocket: WebSocket):
    # Validate API key
    api_key = websocket.query_params.get("api_key")
    if api_key != settings.api_key:
        await websocket.close(code=1008, reason="Unauthorized")
        return

    await websocket.accept()

    try:
        # Receive configuration
        data = await websocket.receive_text()
        payload = json.loads(data)

        url = str(payload['url'])
        max_pages = payload.get('maxPages', 50)
        desc_length = payload.get('descLength', 500)

        # Log function for real-time updates
        async def log(message: str):
            await websocket.send_json({"type": "log", "content": message})

        # Start crawling
        crawler = LLMCrawler(
            url,
            max_pages,
            desc_length,
            log,
            brightdata_enabled=payload.get('useBrightdata', False)
        )
        pages = await crawler.run()

        # Format output
        llms_txt = format_llms_txt(url, pages, md_url_map)

        # Send result
        await websocket.send_json({"type": "result", "content": llms_txt})

        # Save to storage
        hosted_url = await save_llms_txt(url, llms_txt, log)
        await websocket.send_json({"type": "url", "content": hosted_url})

    except Exception as e:
        await websocket.send_json({"type": "error", "content": str(e)})
    finally:
        await websocket.close()

Docker Setup (Optional)

For a simpler setup, use Docker Compose:

Configure Environment Files

Create .env files as described in steps 3-4 above.

Start Services

docker-compose up -d

This starts both backend and frontend:

version: "3.9"

services:
  backend:
    build: ./backend
    ports:
      - "8000:8000"
    env_file:
      - ./backend/.env
    restart: unless-stopped

  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    env_file:
      - ./frontend/.env.local
    environment:
      - NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws/crawl
    restart: unless-stopped
    depends_on:
      - backend

Access the Application

Same URLs as manual setup:

Troubleshooting

Backend won't start - ModuleNotFoundError

Make sure you’ve activated the virtual environment and installed dependencies:

cd backend
source venv/bin/activate
pip install -r requirements.txt

Playwright browser not found

Install Playwright browsers:

playwright install chromium

WebSocket connection fails

Verify:

Backend is running on port 8000
CORS_ORIGINS includes your frontend URL
API key is configured (if required)
Check browser console for error messages

R2 storage errors

Ensure your R2 credentials are correct:

Endpoint URL format: https://<account-id>.r2.cloudflarestorage.com
Access key and secret key are valid
Bucket exists and is accessible
Public domain is configured correctly

Frontend can't connect to backend

Check NEXT_PUBLIC_WS_URL in .env.local:

Should be ws://localhost:8000/ws/crawl for local development
Use wss:// for production with HTTPS

Next Steps

Configuration Guide

Learn about all configuration options and environment variables

API Reference

Explore the full API documentation and endpoints

Deployment

Deploy to AWS with Terraform for production use

Architecture

Deep dive into system architecture and components

For production deployment, see the Deployment Guide which covers AWS ECS, Lambda, and infrastructure setup with Terraform.

Get Started

Core Features

Guides

Deployment

Quick Start

Quick Start Guide

Prerequisites

Python 3.11+

Node.js 20+

Git

Installation

Frontend UI

Backend API

API Docs

Generate Your First llms.txt

Understanding the WebSocket API

Connection

Send Request

Receive Messages

Implementation Example

Docker Setup (Optional)

Troubleshooting

Next Steps

Configuration Guide

API Reference

Deployment

Architecture

Build docs developers (and LLMs) love

Get Started

Core Features

Guides

Deployment

​Quick Start Guide

​Prerequisites

Python 3.11+

Node.js 20+

Git

​Installation

Frontend UI

Backend API

API Docs

​Generate Your First llms.txt

​Understanding the WebSocket API

​Connection

​Send Request

​Receive Messages

​Implementation Example

​Docker Setup (Optional)

​Troubleshooting

​Next Steps

Configuration Guide

API Reference

Deployment

Architecture

Build docs developers (and LLMs) love

Quick Start Guide

Prerequisites

Installation

Generate Your First llms.txt

Understanding the WebSocket API

Connection

Send Request

Receive Messages

Implementation Example

Docker Setup (Optional)

Troubleshooting

Next Steps