Skip to main content
The llms.txt Generator provides an intuitive web interface for generating and managing your llms.txt files. This guide walks through all the features available in the UI.

Accessing the Web Interface

The web interface is available at:

Generating Your First llms.txt

1

Enter the target URL

In the URL input field, enter the website you want to generate an llms.txt file for.
https://example.com
The URL should include the protocol (http:// or https://) and should be the base URL of the site you want to crawl.
2

Configure crawl parameters

Adjust the settings based on your needs:Max Pages: Number of pages to crawl (default: 50)
  • Lower values for faster generation
  • Higher values (up to 200) for comprehensive coverage
Description Length: Character limit for page excerpts (default: 500)
  • Shorter for concise summaries
  • Longer for detailed descriptions
3

Enable optional features

Toggle advanced features as needed:
Use AI (Grok 4.1-Fast) to improve content quality and formatting.
  • Optimizes descriptions for better LLM understanding
  • Adds contextual improvements
  • Requires OPENROUTER_API_KEY configured
This feature must be explicitly enabled in the backend configuration with LLM_ENHANCEMENT_ENABLED=true.
Enable Brightdata proxy for JavaScript-heavy websites.
  • Handles dynamic content loaded via JavaScript
  • Uses Playwright browser automation
  • Requires Brightdata API credentials
Enable this for modern SPAs (React, Vue, Angular) or sites with client-side rendering.
4

Start the generation

Click the Generate llms.txt button to begin crawling.The interface will display real-time progress updates as pages are discovered and processed.

Understanding the Output

Real-time Logs

As the crawler runs, you’ll see logs showing:
Connecting to crawler...
Starting crawl of https://example.com...

Generated Output

Once complete, you’ll see the formatted llms.txt content in the results panel:
Example Output
# Example Site

> A comprehensive platform for building modern web applications with best practices and tools.

## Documentation

- [Getting Started](https://example.com/docs/getting-started): Learn the basics and set up your first project [Quickstart] [Beginner]
- [API Reference](https://example.com/docs/api): Complete API documentation with examples [API] [Reference]
- [Advanced Guides](https://example.com/docs/advanced): In-depth tutorials for complex scenarios [Guide] [Advanced]

## Optional

- [Privacy Policy](https://example.com/privacy)
- [Terms of Service](https://example.com/terms)
Content tags like [API], [Guide], [Quickstart] are automatically added based on page content analysis.

Hosted URL

If R2 storage is configured, you’ll receive a public CDN URL:
https://pub-abc123.r2.dev/example-com-xyz789.txt
This URL can be:
  • Linked from your website’s footer or documentation
  • Submitted to llms.txt directories
  • Used as a reference for LLM indexing

Working with Results

Copy to Clipboard

Click the Copy button to copy the entire llms.txt content to your clipboard for pasting elsewhere.

Download File

Click the Download button to save the llms.txt file to your local machine.
The filename will be llms.txt - you can upload this directly to your website’s root directory.

View in Browser

If a hosted URL was generated, click View Hosted Version to open the public CDN link in a new tab.

Best Practices

Start Small

Begin with 25-50 pages to test the output quality before scaling up to larger crawls.

Review Output

Always review the generated llms.txt to ensure it accurately represents your site structure.

Enable Auto-Update

Set up scheduled recrawls for documentation sites that update frequently (weekly or monthly).

Use .md Links

The generator automatically prefers .md versions of pages when available for better LLM parsing.

Troubleshooting

This error means the WebSocket connection to the backend failed.Solutions:
  • Verify the backend is running at the configured URL
  • Check NEXT_PUBLIC_WS_URL in frontend environment variables
  • Ensure CORS origins include your frontend domain
  • Check firewall/network settings
Token generation failed during the authentication step.Solutions:
  • Verify API_KEY is set in backend .env
  • Check that the frontend can reach the /api/auth/token endpoint
  • Review backend logs for authentication errors
The crawler couldn’t discover any pages on the target site.Solutions:
  • Verify the URL is accessible and returns a valid HTML page
  • Enable Brightdata if the site requires JavaScript
  • Check if the site blocks automated crawlers (User-Agent)
  • Try a different starting URL (e.g., /docs instead of root)
The WebSocket connection was closed before completion.Solutions:
  • Check backend logs for errors or timeouts
  • Reduce maxPages to avoid long-running operations
  • Verify network stability between frontend and backend
  • Check if the backend container has sufficient memory

Advanced Features

Webhook Integration

When auto-update is enabled, you can trigger immediate recrawls using webhooks:
cURL Example
curl -X POST https://your-backend.com/internal/hooks/site-changed \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "webhook_secret": "your-secret-token"
  }'
This is useful for CI/CD pipelines that want to update llms.txt immediately after documentation deploys.

Sentinel URLs

The crawler automatically detects sitemap URLs as “sentinel” URLs for change detection:
  • Checks /sitemap.xml and /sitemap_index.xml
  • Uses Last-Modified headers to detect changes
  • Triggers recrawls only when content has changed
This ensures efficient recrawling by avoiding unnecessary regeneration when content hasn’t changed.

Next Steps

API Usage

Learn how to integrate programmatically with the WebSocket API

Configuration

Explore all environment variables and configuration options

llms.txt Spec

Understand the llms.txt specification and formatting

Development

Set up a local development environment

Build docs developers (and LLMs) love