API Usage

The llms.txt Generator exposes a WebSocket API that allows programmatic generation of llms.txt files. This guide covers authentication, message formats, and implementation patterns.

Overview

The API uses WebSockets for real-time bidirectional communication, allowing you to:

Send crawl requests with custom parameters
Receive real-time progress updates
Get the generated llms.txt content
Retrieve hosted CDN URLs

Authentication

The API supports two authentication methods:

JWT Token (Recommended)
API Key (Direct)

Generate a short-lived token from the /auth/token endpoint.

Request a token

cURL

curl -X POST https://your-backend.com/auth/token \
  -H "X-API-Key: your-api-key"

Response:

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expires_in": 300
}

Tokens are valid for 5 minutes (300 seconds).

Connect with token

WebSocket Connection

const token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...";
const ws = new WebSocket(`wss://your-backend.com/ws/crawl?token=${token}`);

Use your API key directly in the WebSocket query string.

WebSocket Connection

const apiKey = "your-api-key";
const ws = new WebSocket(`wss://your-backend.com/ws/crawl?api_key=${apiKey}`);

This method exposes your API key in the connection URL. Use JWT tokens for production applications.

Setting Up API Key

Configure authentication in your backend .env file:

.env

API_KEY=your-generated-api-key

Generate a secure API key:

Generate Key

openssl rand -base64 32

WebSocket Endpoint

URL: wss://your-backend.com/ws/crawl Query Parameters:

token (string, optional): JWT authentication token
api_key (string, optional): Direct API key authentication

One authentication method (token or api_key) is required unless API_KEY is not configured in the backend.

Request Format

After establishing the WebSocket connection, send a JSON payload to initiate crawling:

Message Schema

url

string

required

The base URL of the website to crawl. Must include protocol (http:// or https://).Example: "https://example.com"

maxPages

integer

default:50

Maximum number of pages to crawl.Range: 1-200Example: 50

descLength

integer

default:500

Character limit for page description excerpts.Range: 100-2000Example: 500

enableAutoUpdate

boolean

default:false

Enable scheduled recrawls for this site.Requires Supabase configuration.

recrawlIntervalMinutes

integer

default:10080

Minutes between scheduled recrawls (default: 7 days).Only used when enableAutoUpdate is true.

llmEnhance

boolean

default:false

Enable LLM-powered content enhancement.Requires LLM_ENHANCEMENT_ENABLED=true in backend config.

useBrightdata

boolean

default:true

Use Brightdata proxy for JavaScript rendering.Falls back to the backend’s BRIGHTDATA_ENABLED setting if not specified.

Example Request

{
  "url": "https://example.com",
  "maxPages": 50,
  "descLength": 500
}

Response Format

The server sends JSON messages with a type and content field:

Message Types

log

string

Progress updates and informational messages.

{
  "type": "log",
  "content": "Crawling page 5/50: API Documentation"
}

result

string

The complete generated llms.txt content.

{
  "type": "result",
  "content": "# Example Site\n\n> A comprehensive platform...\n\n## Documentation\n..."
}

url

string

The public CDN URL where the llms.txt file is hosted.

{
  "type": "url",
  "content": "https://pub-abc123.r2.dev/example-com-xyz789.txt"
}

error

string

Error messages when something goes wrong.

{
  "type": "error",
  "content": "Failed to fetch https://example.com: Connection timeout"
}

Implementation Examples

JavaScript/TypeScript

import { useState, useCallback, useRef } from 'react';

interface CrawlPayload {
  url: string;
  maxPages: number;
  descLength: number;
  enableAutoUpdate?: boolean;
  recrawlIntervalMinutes?: number;
  llmEnhance?: boolean;
  useBrightdata?: boolean;
}

export function useLLMSTxtGenerator() {
  const [logs, setLogs] = useState<string[]>([]);
  const [result, setResult] = useState<string>("");
  const [hostedUrl, setHostedUrl] = useState<string>("");
  const [isGenerating, setIsGenerating] = useState(false);
  const wsRef = useRef<WebSocket | null>(null);

  const generate = useCallback(async (payload: CrawlPayload) => {
    setLogs(["Connecting..."]);
    setResult("");
    setHostedUrl("");
    setIsGenerating(true);

    try {
      // Get JWT token
      const tokenRes = await fetch('/api/auth/token', { method: 'POST' });
      const { token } = await tokenRes.json();

      // Connect to WebSocket
      const ws = new WebSocket(
        `wss://your-backend.com/ws/crawl?token=${token}`
      );
      wsRef.current = ws;

      ws.onopen = () => {
        setLogs(prev => [...prev, `Starting crawl of ${payload.url}...`]);
        ws.send(JSON.stringify(payload));
      };

      ws.onmessage = (event) => {
        const data = JSON.parse(event.data);

        switch (data.type) {
          case "log":
            setLogs(prev => [...prev, data.content]);
            break;
          case "result":
            setResult(data.content);
            break;
          case "url":
            setHostedUrl(data.content);
            break;
          case "error":
            setLogs(prev => [...prev, `ERROR: ${data.content}`]);
            break;
        }
      };

      ws.onerror = () => {
        setLogs(prev => [...prev, "Connection error"]);
        setIsGenerating(false);
      };

      ws.onclose = () => {
        setIsGenerating(false);
      };
    } catch (error) {
      setLogs(prev => [...prev, `Error: ${error}`]);
      setIsGenerating(false);
    }
  }, []);

  const cancel = useCallback(() => {
    wsRef.current?.close();
    wsRef.current = null;
    setIsGenerating(false);
  }, []);

  return { logs, result, hostedUrl, isGenerating, generate, cancel };
}

Python

Python Client

import asyncio
import json
import websockets
import httpx
from typing import Optional, Callable

class LLMSTxtGenerator:
    def __init__(self, backend_url: str, api_key: str):
        self.backend_url = backend_url
        self.api_key = api_key
        self.ws_url = backend_url.replace('https://', 'wss://').replace('http://', 'ws://')

    async def get_token(self) -> str:
        """Get JWT token for authentication."""
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.backend_url}/auth/token",
                headers={"X-API-Key": self.api_key}
            )
            response.raise_for_status()
            return response.json()["token"]

    async def generate(
        self,
        url: str,
        max_pages: int = 50,
        desc_length: int = 500,
        enable_auto_update: bool = False,
        recrawl_interval_minutes: int = 10080,
        llm_enhance: bool = False,
        use_brightdata: bool = True,
        on_log: Optional[Callable[[str], None]] = None
    ) -> tuple[str, Optional[str]]:
        """Generate llms.txt for a website.
        
        Returns:
            (result, hosted_url) tuple
        """
        token = await self.get_token()
        ws_url = f"{self.ws_url}/ws/crawl?token={token}"

        result = None
        hosted_url = None

        async with websockets.connect(ws_url) as websocket:
            # Send crawl request
            await websocket.send(json.dumps({
                "url": url,
                "maxPages": max_pages,
                "descLength": desc_length,
                "enableAutoUpdate": enable_auto_update,
                "recrawlIntervalMinutes": recrawl_interval_minutes,
                "llmEnhance": llm_enhance,
                "useBrightdata": use_brightdata
            }))

            # Receive messages
            async for message in websocket:
                data = json.loads(message)
                msg_type = data.get("type")
                content = data.get("content")

                if msg_type == "log":
                    if on_log:
                        on_log(content)
                    else:
                        print(f"[LOG] {content}")
                elif msg_type == "result":
                    result = content
                elif msg_type == "url":
                    hosted_url = content
                elif msg_type == "error":
                    raise Exception(f"Crawl error: {content}")

        return result, hosted_url

# Usage
async def main():
    generator = LLMSTxtGenerator(
        backend_url="https://your-backend.com",
        api_key="your-api-key"
    )

    result, hosted_url = await generator.generate(
        url="https://example.com",
        max_pages=50,
        desc_length=500,
        enable_auto_update=True
    )

    print("Generated llms.txt:")
    print(result)
    print(f"\nHosted at: {hosted_url}")

if __name__ == "__main__":
    asyncio.run(main())

Error Handling

Connection Errors

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
  // Handle connection failures
};

ws.onclose = (event) => {
  if (event.code === 1008) {
    console.error('Authentication failed');
  } else if (event.code === 1006) {
    console.error('Connection closed abnormally');
  }
};

Server-Side Errors

The server sends error messages with type: "error":

Error Message

{
  "type": "error",
  "content": "Failed to fetch https://example.com: Connection timeout"
}

Common error messages:

"Failed to fetch <url>: Connection timeout" - Target site is unreachable
"Invalid URL format" - URL validation failed
"Max pages must be between 1 and 200" - Invalid parameter
"Crawl interrupted" - Unexpected crawl termination

Rate Limiting

The API does not currently implement rate limiting at the application level. Consider implementing rate limiting in your client code or using a reverse proxy (CloudFlare, nginx) for production deployments.

Client-side rate limiting example:

Rate Limiting

class RateLimitedGenerator {
  constructor(maxConcurrent = 3) {
    this.maxConcurrent = maxConcurrent;
    this.active = 0;
    this.queue = [];
  }

  async generate(config) {
    if (this.active >= this.maxConcurrent) {
      await new Promise(resolve => this.queue.push(resolve));
    }

    this.active++;
    try {
      return await actualGenerate(config);
    } finally {
      this.active--;
      if (this.queue.length > 0) {
        this.queue.shift()();
      }
    }
  }
}

Testing

Using wscat

Test the WebSocket API from the command line:

Install wscat

npm install -g wscat

Connect and Send

# Connect with API key
wscat -c "wss://your-backend.com/ws/crawl?api_key=your-key"

# Send crawl request (after connection)
{"url":"https://example.com","maxPages":10,"descLength":300}

Health Check

Verify the backend is running:

Health Endpoint

curl https://your-backend.com/health

Expected response:

{
  "status": "ok"
}

Next Steps

Configuration

Learn about all environment variables and settings

Web Interface

Use the user-friendly web UI instead of the API

API Reference

View the complete API specification

Deployment

Deploy your own instance

Get Started

Core Features

Guides

Deployment

Overview

Authentication

Setting Up API Key

WebSocket Endpoint

Request Format

Message Schema

Example Request

Response Format

Message Types

Implementation Examples

JavaScript/TypeScript

Python

Error Handling

Connection Errors

Server-Side Errors

Rate Limiting

Testing

Using wscat

Health Check

Next Steps

Configuration

Web Interface

API Reference

Deployment

Build docs developers (and LLMs) love

Get Started

Core Features

Guides

Deployment

​Overview

​Authentication

​Setting Up API Key

​WebSocket Endpoint

​Request Format

​Message Schema

​Example Request

​Response Format

​Message Types

​Implementation Examples

​JavaScript/TypeScript

​Python

​Error Handling

​Connection Errors

​Server-Side Errors

​Rate Limiting

​Testing

​Using wscat

​Health Check

​Next Steps

Configuration

Web Interface

API Reference

Deployment

Build docs developers (and LLMs) love

Overview

Authentication

Setting Up API Key

WebSocket Endpoint

Request Format

Message Schema

Example Request

Response Format

Message Types

Implementation Examples

JavaScript/TypeScript

Python

Error Handling

Connection Errors

Server-Side Errors

Rate Limiting

Testing

Using wscat

Health Check

Next Steps