Skip to main content
The llms.txt Generator is configured through environment variables. This guide covers all available settings for both backend and frontend components.

Backend Configuration

Backend configuration is managed through a .env file in the backend/ directory.

Required Settings

CORS_ORIGINS
string
default:"http://localhost:3000"
Comma-separated list of allowed CORS origins.
.env
CORS_ORIGINS=http://localhost:3000,https://yourdomain.com
The frontend URL must be included in this list for WebSocket connections to work.

Database Configuration

Required for auto-update functionality and site tracking.
SUPABASE_URL
string
required
Your Supabase project URL.
.env
SUPABASE_URL=https://your-project.supabase.co
Get this from: Supabase Dashboard → Settings → API → Project URL
SUPABASE_KEY
string
required
Your Supabase anonymous/public key.
.env
SUPABASE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Get this from: Supabase Dashboard → Settings → API → anon public

Storage Configuration

Required for hosting generated llms.txt files on a public CDN.
R2_ENDPOINT
string
required
Cloudflare R2 endpoint URL.
.env
R2_ENDPOINT=https://abc123.r2.cloudflarestorage.com
Format: https://<account-id>.r2.cloudflarestorage.com
R2_ACCESS_KEY
string
required
R2 access key ID.
.env
R2_ACCESS_KEY=your-access-key-id
Generate from: Cloudflare Dashboard → R2 → Manage R2 API Tokens
R2_SECRET_KEY
string
required
R2 secret access key.
.env
R2_SECRET_KEY=your-secret-access-key
Keep this secret! Never commit to version control.
R2_BUCKET
string
default:"llms-txt"
Name of the R2 bucket for storing llms.txt files.
.env
R2_BUCKET=llms-txt
Create this bucket in the Cloudflare R2 dashboard before using.
R2_PUBLIC_DOMAIN
string
required
Public domain for accessing R2 files.
.env
R2_PUBLIC_DOMAIN=https://pub-abc123.r2.dev
Options:
  • Use R2’s public domain: https://pub-<bucket-id>.r2.dev
  • Use a custom domain connected to R2

Brightdata Configuration

Optional proxy service for crawling JavaScript-heavy websites.
BRIGHTDATA_API_KEY
string
Your Brightdata customer ID.
.env
BRIGHTDATA_API_KEY=your-customer-id
Get this from: Brightdata Dashboard → Overview → Customer ID
BRIGHTDATA_PASSWORD
string
Zone password for your Brightdata zone.
.env
BRIGHTDATA_PASSWORD=your-zone-password
Get this from: Brightdata Dashboard → Zones → Zone Password
BRIGHTDATA_ZONE
string
default:"scraping_browser1"
Brightdata zone to use for crawling.
.env
BRIGHTDATA_ZONE=scraping_browser1
Options:
  • scraping_browser1 - Standard scraping browser
  • Custom zones you’ve created
BRIGHTDATA_ENABLED
boolean
default:true
Global toggle for Brightdata usage.
.env
BRIGHTDATA_ENABLED=true
Individual requests can override this with the useBrightdata parameter.

LLM Enhancement Configuration

Optional AI-powered content optimization using OpenRouter.
LLM_ENHANCEMENT_ENABLED
boolean
default:false
Enable LLM-powered content enhancement.
.env
LLM_ENHANCEMENT_ENABLED=false
Must be explicitly enabled. Requires OPENROUTER_API_KEY to be set.
OPENROUTER_API_KEY
string
OpenRouter API key for LLM enhancement.
.env
OPENROUTER_API_KEY=sk-or-v1-...
Get this from: https://openrouter.ai/keys
OPENROUTER_MODEL
string
default:"x-ai/grok-4.1-fast:free"
OpenRouter model to use for enhancement.
.env
OPENROUTER_MODEL=x-ai/grok-4.1-fast:free
Options:
  • x-ai/grok-4.1-fast:free - Grok 4.1 Fast (free tier)
  • anthropic/claude-3-5-sonnet - Claude 3.5 Sonnet
  • openai/gpt-4-turbo - GPT-4 Turbo
See OpenRouter docs for all models.
LLM_TIMEOUT_SECONDS
float
Timeout for LLM API requests in seconds.
.env
LLM_TIMEOUT_SECONDS=30.0
LLM_MAX_RETRIES
integer
default:3
Maximum retry attempts for failed LLM requests.
.env
LLM_MAX_RETRIES=3
LLM_TEMPERATURE
float
Temperature parameter for LLM generation (0.0-1.0).
.env
LLM_TEMPERATURE=0.3
  • Lower values (0.1-0.3): More consistent, deterministic output
  • Higher values (0.7-1.0): More creative, varied output

Security Configuration

API_KEY
string
API key for authenticating WebSocket connections.
.env
API_KEY=your-secure-api-key
Generate a secure key:
openssl rand -base64 32
If not set, WebSocket connections will not require authentication. Only use this for development.
CRON_SECRET
string
Secret token for authenticating cron job requests.
.env
CRON_SECRET=your-cron-secret-token
Required for:
  • /internal/cron/recrawl - Scheduled recrawl endpoint
Generate a secure token:
openssl rand -base64 32

Frontend Configuration

Frontend configuration is managed through a .env.local file in the frontend/ directory.
NEXT_PUBLIC_WS_URL
string
required
WebSocket URL for the backend API.
.env.local
# Development
NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws/crawl

# Production
NEXT_PUBLIC_WS_URL=wss://your-backend.com/ws/crawl
Use ws:// for local development and wss:// (secure WebSocket) for production.
NEXT_PUBLIC_API_KEY
string
API key for authenticating with the backend (optional).
.env.local
NEXT_PUBLIC_API_KEY=your-api-key
Only use this for testing. Production should use JWT tokens via the /auth/token endpoint.

Configuration File Reference

The backend uses Pydantic Settings for configuration management:
config.py
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    # CORS
    cors_origins: str = "http://localhost:3000"
    
    # R2 Storage
    r2_endpoint: str | None = None
    r2_access_key: str | None = None
    r2_secret_key: str | None = None
    r2_bucket: str | None = None
    r2_public_domain: str | None = None
    
    # Database
    supabase_url: str | None = None
    supabase_key: str | None = None
    
    # Security
    cron_secret: str | None = None
    api_key: str | None = None
    
    # Brightdata
    brightdata_api_key: str | None = None
    brightdata_enabled: bool = True
    brightdata_zone: str = "scraping_browser1"
    brightdata_password: str | None = None
    
    # LLM Enhancement
    openrouter_api_key: str | None = None
    openrouter_model: str = "x-ai/grok-4.1-fast:free"
    llm_enhancement_enabled: bool = False
    llm_timeout_seconds: float = 30.0
    llm_max_retries: int = 3
    llm_temperature: float = 0.3

    class Config:
        env_file = ".env"

Environment File Examples

Development Environment

# CORS - Allow local frontend
CORS_ORIGINS=http://localhost:3000

# Database (Optional for development)
# SUPABASE_URL=https://xxx.supabase.co
# SUPABASE_KEY=your-key

# Storage (Optional for development)
# R2_ENDPOINT=https://xxx.r2.cloudflarestorage.com
# R2_ACCESS_KEY=your-key
# R2_SECRET_KEY=your-secret
# R2_BUCKET=llms-txt
# R2_PUBLIC_DOMAIN=https://pub-xxx.r2.dev

# Brightdata (Optional)
# BRIGHTDATA_API_KEY=your-customer-id
# BRIGHTDATA_ENABLED=false
# BRIGHTDATA_ZONE=scraping_browser1
# BRIGHTDATA_PASSWORD=your-password

# Security (Optional for development)
# API_KEY=dev-key-only

Production Environment

# CORS - Production domains
CORS_ORIGINS=https://yourdomain.com,https://www.yourdomain.com

# Database - Required
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

# Storage - Required
R2_ENDPOINT=https://abc123.r2.cloudflarestorage.com
R2_ACCESS_KEY=your-access-key
R2_SECRET_KEY=your-secret-key
R2_BUCKET=llms-txt
R2_PUBLIC_DOMAIN=https://cdn.yourdomain.com

# Brightdata - Recommended
BRIGHTDATA_API_KEY=your-customer-id
BRIGHTDATA_ENABLED=true
BRIGHTDATA_ZONE=scraping_browser1
BRIGHTDATA_PASSWORD=your-zone-password

# LLM Enhancement - Optional
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=x-ai/grok-4.1-fast:free
LLM_ENHANCEMENT_ENABLED=true
LLM_TIMEOUT_SECONDS=30.0
LLM_MAX_RETRIES=3
LLM_TEMPERATURE=0.3

# Security - Required
API_KEY=<generated-secure-key>
CRON_SECRET=<generated-secure-token>

Validation

Verify your configuration:
1

Test backend health

Health Check
curl http://localhost:8000/health
Expected response:
{"status": "ok"}
2

Test database connection

If Supabase is configured, the backend will log:
INFO: Supabase client initialized
3

Test R2 storage

Generate a llms.txt file and verify the hosted URL is accessible.
4

Test authentication

Get Token
curl -X POST http://localhost:8000/auth/token \
  -H "X-API-Key: your-api-key"
Should return a JWT token.

Troubleshooting

Symptom: WebSocket connection failed: CORS errorSolution:
  • Add your frontend URL to CORS_ORIGINS in backend .env
  • Ensure URLs match exactly (including protocol and port)
  • Restart the backend after changing .env
Example
CORS_ORIGINS=http://localhost:3000,https://yourdomain.com
Symptom: Generated llms.txt shows but no public URLSolution:
  • Verify all R2 configuration variables are set:
    • R2_ENDPOINT
    • R2_ACCESS_KEY
    • R2_SECRET_KEY
    • R2_BUCKET
    • R2_PUBLIC_DOMAIN
  • Verify the bucket exists in Cloudflare R2
  • Check bucket permissions allow public reads
Symptom: Sites not recrawling on scheduleSolution:
  • Verify SUPABASE_URL and SUPABASE_KEY are set
  • Check the crawl_sites table exists in Supabase
  • Verify AWS Lambda/cron job is configured
  • Check CRON_SECRET matches between backend and Lambda
Symptom: “Failed to connect to Brightdata” errorsSolution:
  • Verify BRIGHTDATA_API_KEY (customer ID) is correct
  • Verify BRIGHTDATA_PASSWORD matches your zone password
  • Check BRIGHTDATA_ZONE exists in your Brightdata account
  • Ensure you have credits/subscription active
Symptom: No enhancement despite enabling in UISolution:
  • Set LLM_ENHANCEMENT_ENABLED=true in backend .env
  • Verify OPENROUTER_API_KEY is valid
  • Check OpenRouter account has credits
  • Review backend logs for LLM errors

Best Practices

Use Environment Variables

Never hardcode secrets in source code. Always use environment variables.

Rotate Keys Regularly

Change API keys, tokens, and secrets periodically (every 90 days).

Separate Environments

Use different credentials for development, staging, and production.

Validate on Deploy

Test all endpoints after deployment to catch configuration issues early.

Next Steps

Web Interface

Learn how to use the web UI with your configuration

API Usage

Integrate programmatically using the WebSocket API

Deployment

Deploy to production with proper configuration

Development

Set up a local development environment

Build docs developers (and LLMs) love