Skip to main content
The llms.txt Generator exposes a WebSocket API that allows programmatic generation of llms.txt files. This guide covers authentication, message formats, and implementation patterns.

Overview

The API uses WebSockets for real-time bidirectional communication, allowing you to:
  • Send crawl requests with custom parameters
  • Receive real-time progress updates
  • Get the generated llms.txt content
  • Retrieve hosted CDN URLs

Authentication

The API supports two authentication methods:

Setting Up API Key

Configure authentication in your backend .env file:
.env
API_KEY=your-generated-api-key
Generate a secure API key:
Generate Key
openssl rand -base64 32

WebSocket Endpoint

URL: wss://your-backend.com/ws/crawl Query Parameters:
  • token (string, optional): JWT authentication token
  • api_key (string, optional): Direct API key authentication
One authentication method (token or api_key) is required unless API_KEY is not configured in the backend.

Request Format

After establishing the WebSocket connection, send a JSON payload to initiate crawling:

Message Schema

url
string
required
The base URL of the website to crawl. Must include protocol (http:// or https://).Example: "https://example.com"
maxPages
integer
default:50
Maximum number of pages to crawl.Range: 1-200Example: 50
descLength
integer
default:500
Character limit for page description excerpts.Range: 100-2000Example: 500
enableAutoUpdate
boolean
default:false
Enable scheduled recrawls for this site.Requires Supabase configuration.
recrawlIntervalMinutes
integer
default:10080
Minutes between scheduled recrawls (default: 7 days).Only used when enableAutoUpdate is true.
llmEnhance
boolean
default:false
Enable LLM-powered content enhancement.Requires LLM_ENHANCEMENT_ENABLED=true in backend config.
useBrightdata
boolean
default:true
Use Brightdata proxy for JavaScript rendering.Falls back to the backend’s BRIGHTDATA_ENABLED setting if not specified.

Example Request

{
  "url": "https://example.com",
  "maxPages": 50,
  "descLength": 500
}

Response Format

The server sends JSON messages with a type and content field:

Message Types

log
string
Progress updates and informational messages.
{
  "type": "log",
  "content": "Crawling page 5/50: API Documentation"
}
result
string
The complete generated llms.txt content.
{
  "type": "result",
  "content": "# Example Site\n\n> A comprehensive platform...\n\n## Documentation\n..."
}
url
string
The public CDN URL where the llms.txt file is hosted.
{
  "type": "url",
  "content": "https://pub-abc123.r2.dev/example-com-xyz789.txt"
}
error
string
Error messages when something goes wrong.
{
  "type": "error",
  "content": "Failed to fetch https://example.com: Connection timeout"
}

Implementation Examples

JavaScript/TypeScript

import { useState, useCallback, useRef } from 'react';

interface CrawlPayload {
  url: string;
  maxPages: number;
  descLength: number;
  enableAutoUpdate?: boolean;
  recrawlIntervalMinutes?: number;
  llmEnhance?: boolean;
  useBrightdata?: boolean;
}

export function useLLMSTxtGenerator() {
  const [logs, setLogs] = useState<string[]>([]);
  const [result, setResult] = useState<string>("");
  const [hostedUrl, setHostedUrl] = useState<string>("");
  const [isGenerating, setIsGenerating] = useState(false);
  const wsRef = useRef<WebSocket | null>(null);

  const generate = useCallback(async (payload: CrawlPayload) => {
    setLogs(["Connecting..."]);
    setResult("");
    setHostedUrl("");
    setIsGenerating(true);

    try {
      // Get JWT token
      const tokenRes = await fetch('/api/auth/token', { method: 'POST' });
      const { token } = await tokenRes.json();

      // Connect to WebSocket
      const ws = new WebSocket(
        `wss://your-backend.com/ws/crawl?token=${token}`
      );
      wsRef.current = ws;

      ws.onopen = () => {
        setLogs(prev => [...prev, `Starting crawl of ${payload.url}...`]);
        ws.send(JSON.stringify(payload));
      };

      ws.onmessage = (event) => {
        const data = JSON.parse(event.data);

        switch (data.type) {
          case "log":
            setLogs(prev => [...prev, data.content]);
            break;
          case "result":
            setResult(data.content);
            break;
          case "url":
            setHostedUrl(data.content);
            break;
          case "error":
            setLogs(prev => [...prev, `ERROR: ${data.content}`]);
            break;
        }
      };

      ws.onerror = () => {
        setLogs(prev => [...prev, "Connection error"]);
        setIsGenerating(false);
      };

      ws.onclose = () => {
        setIsGenerating(false);
      };
    } catch (error) {
      setLogs(prev => [...prev, `Error: ${error}`]);
      setIsGenerating(false);
    }
  }, []);

  const cancel = useCallback(() => {
    wsRef.current?.close();
    wsRef.current = null;
    setIsGenerating(false);
  }, []);

  return { logs, result, hostedUrl, isGenerating, generate, cancel };
}

Python

Python Client
import asyncio
import json
import websockets
import httpx
from typing import Optional, Callable

class LLMSTxtGenerator:
    def __init__(self, backend_url: str, api_key: str):
        self.backend_url = backend_url
        self.api_key = api_key
        self.ws_url = backend_url.replace('https://', 'wss://').replace('http://', 'ws://')

    async def get_token(self) -> str:
        """Get JWT token for authentication."""
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.backend_url}/auth/token",
                headers={"X-API-Key": self.api_key}
            )
            response.raise_for_status()
            return response.json()["token"]

    async def generate(
        self,
        url: str,
        max_pages: int = 50,
        desc_length: int = 500,
        enable_auto_update: bool = False,
        recrawl_interval_minutes: int = 10080,
        llm_enhance: bool = False,
        use_brightdata: bool = True,
        on_log: Optional[Callable[[str], None]] = None
    ) -> tuple[str, Optional[str]]:
        """Generate llms.txt for a website.
        
        Returns:
            (result, hosted_url) tuple
        """
        token = await self.get_token()
        ws_url = f"{self.ws_url}/ws/crawl?token={token}"

        result = None
        hosted_url = None

        async with websockets.connect(ws_url) as websocket:
            # Send crawl request
            await websocket.send(json.dumps({
                "url": url,
                "maxPages": max_pages,
                "descLength": desc_length,
                "enableAutoUpdate": enable_auto_update,
                "recrawlIntervalMinutes": recrawl_interval_minutes,
                "llmEnhance": llm_enhance,
                "useBrightdata": use_brightdata
            }))

            # Receive messages
            async for message in websocket:
                data = json.loads(message)
                msg_type = data.get("type")
                content = data.get("content")

                if msg_type == "log":
                    if on_log:
                        on_log(content)
                    else:
                        print(f"[LOG] {content}")
                elif msg_type == "result":
                    result = content
                elif msg_type == "url":
                    hosted_url = content
                elif msg_type == "error":
                    raise Exception(f"Crawl error: {content}")

        return result, hosted_url

# Usage
async def main():
    generator = LLMSTxtGenerator(
        backend_url="https://your-backend.com",
        api_key="your-api-key"
    )

    result, hosted_url = await generator.generate(
        url="https://example.com",
        max_pages=50,
        desc_length=500,
        enable_auto_update=True
    )

    print("Generated llms.txt:")
    print(result)
    print(f"\nHosted at: {hosted_url}")

if __name__ == "__main__":
    asyncio.run(main())

Error Handling

Connection Errors

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
  // Handle connection failures
};

ws.onclose = (event) => {
  if (event.code === 1008) {
    console.error('Authentication failed');
  } else if (event.code === 1006) {
    console.error('Connection closed abnormally');
  }
};

Server-Side Errors

The server sends error messages with type: "error":
Error Message
{
  "type": "error",
  "content": "Failed to fetch https://example.com: Connection timeout"
}
Common error messages:
  • "Failed to fetch <url>: Connection timeout" - Target site is unreachable
  • "Invalid URL format" - URL validation failed
  • "Max pages must be between 1 and 200" - Invalid parameter
  • "Crawl interrupted" - Unexpected crawl termination

Rate Limiting

The API does not currently implement rate limiting at the application level. Consider implementing rate limiting in your client code or using a reverse proxy (CloudFlare, nginx) for production deployments.
Client-side rate limiting example:
Rate Limiting
class RateLimitedGenerator {
  constructor(maxConcurrent = 3) {
    this.maxConcurrent = maxConcurrent;
    this.active = 0;
    this.queue = [];
  }

  async generate(config) {
    if (this.active >= this.maxConcurrent) {
      await new Promise(resolve => this.queue.push(resolve));
    }

    this.active++;
    try {
      return await actualGenerate(config);
    } finally {
      this.active--;
      if (this.queue.length > 0) {
        this.queue.shift()();
      }
    }
  }
}

Testing

Using wscat

Test the WebSocket API from the command line:
Install wscat
npm install -g wscat
Connect and Send
# Connect with API key
wscat -c "wss://your-backend.com/ws/crawl?api_key=your-key"

# Send crawl request (after connection)
{"url":"https://example.com","maxPages":10,"descLength":300}

Health Check

Verify the backend is running:
Health Endpoint
curl https://your-backend.com/health
Expected response:
{
  "status": "ok"
}

Next Steps

Configuration

Learn about all environment variables and settings

Web Interface

Use the user-friendly web UI instead of the API

API Reference

View the complete API specification

Deployment

Deploy your own instance

Build docs developers (and LLMs) love