Accessing the Web Interface
The web interface is available at:- Production: https://llmstxt.vercel.app/
- Local Development: http://localhost:3000
Generating Your First llms.txt
Enter the target URL
In the URL input field, enter the website you want to generate an llms.txt file for.
The URL should include the protocol (http:// or https://) and should be the base URL of the site you want to crawl.
Configure crawl parameters
Adjust the settings based on your needs:Max Pages: Number of pages to crawl (default: 50)
- Lower values for faster generation
- Higher values (up to 200) for comprehensive coverage
- Shorter for concise summaries
- Longer for detailed descriptions
Enable optional features
Toggle advanced features as needed:
Auto-Update (Recommended)
Auto-Update (Recommended)
Enable scheduled recrawls to keep your llms.txt file synchronized with website changes.
- Recrawl Interval: Time between updates in minutes (default: 10080 = 7 days)
- Useful for documentation sites that update frequently
- Powered by AWS Lambda + EventBridge
LLM Enhancement
LLM Enhancement
Use AI (Grok 4.1-Fast) to improve content quality and formatting.
- Optimizes descriptions for better LLM understanding
- Adds contextual improvements
- Requires
OPENROUTER_API_KEYconfigured
This feature must be explicitly enabled in the backend configuration with
LLM_ENHANCEMENT_ENABLED=true.Use Brightdata
Use Brightdata
Enable Brightdata proxy for JavaScript-heavy websites.
- Handles dynamic content loaded via JavaScript
- Uses Playwright browser automation
- Requires Brightdata API credentials
Understanding the Output
Real-time Logs
As the crawler runs, you’ll see logs showing:Generated Output
Once complete, you’ll see the formatted llms.txt content in the results panel:Example Output
Content tags like
[API], [Guide], [Quickstart] are automatically added based on page content analysis.Hosted URL
If R2 storage is configured, you’ll receive a public CDN URL:- Linked from your website’s footer or documentation
- Submitted to llms.txt directories
- Used as a reference for LLM indexing
Working with Results
Copy to Clipboard
Click the Copy button to copy the entire llms.txt content to your clipboard for pasting elsewhere.Download File
Click the Download button to save the llms.txt file to your local machine.View in Browser
If a hosted URL was generated, click View Hosted Version to open the public CDN link in a new tab.Best Practices
Start Small
Begin with 25-50 pages to test the output quality before scaling up to larger crawls.
Review Output
Always review the generated llms.txt to ensure it accurately represents your site structure.
Enable Auto-Update
Set up scheduled recrawls for documentation sites that update frequently (weekly or monthly).
Use .md Links
The generator automatically prefers .md versions of pages when available for better LLM parsing.
Troubleshooting
Connection error - is backend running?
Connection error - is backend running?
This error means the WebSocket connection to the backend failed.Solutions:
- Verify the backend is running at the configured URL
- Check
NEXT_PUBLIC_WS_URLin frontend environment variables - Ensure CORS origins include your frontend domain
- Check firewall/network settings
Failed to authenticate
Failed to authenticate
Token generation failed during the authentication step.Solutions:
- Verify
API_KEYis set in backend.env - Check that the frontend can reach the
/api/auth/tokenendpoint - Review backend logs for authentication errors
No pages found
No pages found
The crawler couldn’t discover any pages on the target site.Solutions:
- Verify the URL is accessible and returns a valid HTML page
- Enable Brightdata if the site requires JavaScript
- Check if the site blocks automated crawlers (User-Agent)
- Try a different starting URL (e.g., /docs instead of root)
Crawl stops unexpectedly
Crawl stops unexpectedly
The WebSocket connection was closed before completion.Solutions:
- Check backend logs for errors or timeouts
- Reduce
maxPagesto avoid long-running operations - Verify network stability between frontend and backend
- Check if the backend container has sufficient memory
Advanced Features
Webhook Integration
When auto-update is enabled, you can trigger immediate recrawls using webhooks:cURL Example
Sentinel URLs
The crawler automatically detects sitemap URLs as “sentinel” URLs for change detection:- Checks
/sitemap.xmland/sitemap_index.xml - Uses Last-Modified headers to detect changes
- Triggers recrawls only when content has changed
This ensures efficient recrawling by avoiding unnecessary regeneration when content hasn’t changed.
Next Steps
API Usage
Learn how to integrate programmatically with the WebSocket API
Configuration
Explore all environment variables and configuration options
llms.txt Spec
Understand the llms.txt specification and formatting
Development
Set up a local development environment