Overview
The WebSocket endpoint provides real-time bidirectional communication for crawling websites and generatingllms.txt files. It streams progress updates, logs, and results as the crawl happens.
Endpoint
Authentication
The endpoint supports two authentication methods:Method 1: JWT Token (Recommended)
Obtain a short-lived JWT token from the/auth/token endpoint and pass it as a query parameter:
Method 2: API Key
Pass your API key directly as a query parameter:Request Format
After connecting, send a JSON payload with the crawl configuration:The base URL of the website to crawl. Must be a valid HTTP/HTTPS URL.Example:
"https://example.com"Maximum number of pages to crawl. Used to prevent excessive crawling.Range: 1-1000
Maximum length of description excerpts in characters. Truncated at semantic boundaries.Range: 100-2000
Enable automatic periodic recrawls for this site. Stores site metadata in the database.
Minutes between automatic recrawls (default: 7 days). Only used if
enableAutoUpdate is true.Common values:- 360 (6 hours)
- 1440 (1 day)
- 10080 (1 week)
Use AI (Grok 4.1-Fast) to enhance and optimize the generated
llms.txt content. Requires OPENROUTER_API_KEY and LLM_ENHANCEMENT_ENABLED=true.Use Brightdata’s Scraping Browser for JavaScript-heavy sites. Falls back to Playwright if unavailable.
Example Request
Response Format
The server sends JSON messages with different types throughout the crawl process:Log Messages
Always
"log" for progress updatesHuman-readable log message describing the current operation
Result Message
Always
"result" for the generated llms.txt contentThe complete generated
llms.txt file in Markdown formatURL Message
Always
"url" for the hosted file URLPublic CDN URL where the
llms.txt file is hosted (Cloudflare R2)Error Message
Always
"error" for error conditionsError description
Connection Flow
- Connect: Open WebSocket with authentication parameter
- Authenticate: Server validates token/API key
- Send Request: Client sends crawl configuration JSON
- Receive Logs: Server streams progress updates in real-time
- Receive Result: Server sends complete
llms.txtcontent - Receive URL: Server sends hosted file URL (if R2 configured)
- Close: Connection closes automatically after completion
Error Handling
Authentication Errors
Invalid TokenRuntime Errors
Runtime errors are sent as JSON error messages before the connection closes:"Invalid URL format""Failed to fetch page: [details]""Crawl timeout exceeded""Maximum pages limit reached"
Example Implementation
JavaScript/TypeScript
Python
Rate Limits
- No explicit rate limits on the WebSocket endpoint
- Crawling is limited by
maxPagesparameter - Consider backend resource usage when setting high
maxPagesvalues - Use
enableAutoUpdateto avoid manual repeated crawls
Best Practices
- Use JWT tokens instead of API keys for better security
- Set reasonable
maxPageslimits (50-100 for most sites) - Enable auto-update for sites that change frequently
- Handle all message types in your client code
- Implement reconnection logic for production use
- Validate URLs before sending to prevent errors
- Use Brightdata (
useBrightdata: true) for JavaScript-heavy sites
Related Endpoints
- Authentication - Obtain JWT tokens
- Webhooks - Trigger recrawls via webhook