Quick Start Guide
Get the llms.txt Generator running locally in under 5 minutes. This guide will walk you through setting up both the backend and frontend for development.Prerequisites
Before you begin, ensure you have the following installed:Python 3.11+
Required for the FastAPI backend
Node.js 20+
Required for the Next.js frontend
Git
For cloning the repository
Docker is optional but recommended for simplified deployment. See the Docker Setup section below.
Installation
Backend Setup
Set up the Python environment and install dependencies:The Install Playwright browsers:
requirements.txt includes:Backend Environment Configuration
Create your backend environment file:Edit
.env with your configuration:Frontend Setup
In a new terminal, navigate to the frontend directory:Create the frontend environment file:Edit
.env.local:Start the Servers
Start both servers in separate terminals:You should see:
- Backend:
INFO: Uvicorn running on http://127.0.0.1:8000 - Frontend:
Ready on http://localhost:3000
Access the Application
Open your browser and navigate to:
Generate Your First llms.txt
Now that everything is running, let’s generate your firstllms.txt file:
Open the Web Interface
Navigate to http://localhost:3000 in your browser.
Enter a Website URL
Enter a website URL you want to crawl. For testing, try:
https://docs.python.orghttps://fastapi.tiangolo.com- Your own documentation site
Configure Crawl Parameters
Adjust the settings based on your needs:
- Max Pages: Number of pages to crawl (default: 50)
- Description Length: Character limit for page excerpts (default: 500)
- Enable Auto-Update: Schedule periodic recrawls (optional)
- Recrawl Interval: Minutes between updates (default: 360)
- LLM Enhancement: AI-powered optimization (optional)
- Use Brightdata: For JavaScript-heavy sites (optional)
Start Crawling
Click “Generate llms.txt” and watch the real-time progress in the log window.You’ll see messages like:
Understanding the WebSocket API
The backend uses WebSockets for real-time communication. Here’s how the protocol works:Connection
Send Request
Receive Messages
The server sends different message types:Implementation Example
Here’s the core WebSocket handler from the backend:Docker Setup (Optional)
For a simpler setup, use Docker Compose:Access the Application
Same URLs as manual setup:
- Frontend: http://localhost:3000
- Backend: http://localhost:8000
- API Docs: http://localhost:8000/docs
Troubleshooting
Backend won't start - ModuleNotFoundError
Backend won't start - ModuleNotFoundError
Make sure you’ve activated the virtual environment and installed dependencies:
Playwright browser not found
Playwright browser not found
Install Playwright browsers:
WebSocket connection fails
WebSocket connection fails
Verify:
- Backend is running on port 8000
- CORS_ORIGINS includes your frontend URL
- API key is configured (if required)
- Check browser console for error messages
R2 storage errors
R2 storage errors
Ensure your R2 credentials are correct:
- Endpoint URL format:
https://<account-id>.r2.cloudflarestorage.com - Access key and secret key are valid
- Bucket exists and is accessible
- Public domain is configured correctly
Frontend can't connect to backend
Frontend can't connect to backend
Check
NEXT_PUBLIC_WS_URL in .env.local:- Should be
ws://localhost:8000/ws/crawlfor local development - Use
wss://for production with HTTPS
Next Steps
Configuration Guide
Learn about all configuration options and environment variables
API Reference
Explore the full API documentation and endpoints
Deployment
Deploy to AWS with Terraform for production use
Architecture
Deep dive into system architecture and components
For production deployment, see the Deployment Guide which covers AWS ECS, Lambda, and infrastructure setup with Terraform.