Overview
The webhook endpoint allows external systems to notify the llms.txt generator when a website’s content has changed, triggering an immediate recrawl. This is useful for keepingllms.txt files synchronized with content management systems, CI/CD pipelines, or other automated workflows.
Unlike the scheduled cron endpoint, this webhook triggers a recrawl for a specific site immediately.
Endpoint
Authentication
Webhooks support optional per-site authentication via webhook secrets:- If a
webhook_secretis configured for the site in the database, it must be provided in the request - If no secret is configured, the webhook can be called without authentication (not recommended for production)
Optional secret token for authenticating webhook calls. Must match the
webhook_secret stored in the database for the given site.Request
The base URL of the site to recrawl. Must match a site enrolled in the auto-update system.Example:
"https://docs.example.com"Authentication secret (required only if configured for this site in the database).
Example Request
Response
Success Response (200)
Always
"scheduled" when the recrawl is successfully queued.The base URL that was scheduled for recrawl (echoed from request).
ISO 8601 timestamp when the recrawl will be processed. Set to current time for immediate processing.
Error Responses
Site Not Enrolled (404)
Returned when thebase_url is not found in the crawl_sites table.
enableAutoUpdate: true via the WebSocket endpoint.
Invalid Webhook Secret (401)
Returned when the providedwebhook_secret doesn’t match the stored value.
Database Unavailable (503)
Returned when Supabase connection fails.Internal Error (500)
Returned for unexpected server errors.How It Works
1. Validation
The endpoint performs these checks:- Database connectivity: Ensures Supabase is available
- Site enrollment: Verifies
base_urlexists incrawl_sitestable - Secret validation: If a secret is stored, validates the provided secret matches
2. Scheduling
If validation passes:- Sets
next_crawl_atto current timestamp (immediate processing) - Updates
updated_attimestamp - Returns confirmation
3. Processing
The actual recrawl happens when:- The scheduled cron job runs (checks for sites with
next_crawl_at <= NOW()) - This webhook sets
next_crawl_atto now, so the site will be picked up on the next cron run
The webhook schedules a recrawl but doesn’t execute it immediately. The cron job must be running to process scheduled recrawls.
Integration Examples
Mintlify CI/CD
Next.js API Route
Vercel Deploy Hook
WordPress Plugin
Security Configuration
Setting Up Webhook Secrets
Webhook secrets are stored per-site in thecrawl_sites table:
Security Best Practices
- Always use webhook secrets in production
- Generate unique secrets per site if hosting multiple sites
- Use HTTPS for all webhook calls
- Rotate secrets periodically
- Store secrets securely (environment variables, secret managers)
- Validate webhook source in your CI/CD pipeline
Database Schema
Relevant fields in thecrawl_sites table:
Error Codes
| Status Code | Description | Reason |
|---|---|---|
| 200 | Success | Recrawl scheduled successfully |
| 401 | Unauthorized | Invalid or missing webhook secret |
| 404 | Not Found | Site not enrolled in auto-update system |
| 503 | Service Unavailable | Database connection failed |
| 500 | Internal Server Error | Unexpected server error |
Rate Limiting
No explicit rate limits are enforced on this endpoint. However:- Multiple calls for the same site will update
next_crawl_ateach time - The cron job processes sites sequentially, so only one recrawl happens at a time
- Consider implementing rate limiting in your webhook caller to avoid excessive requests
Monitoring
Check database to verify webhook calls:Comparison with Cron Endpoint
| Feature | Webhook (/hooks/site-changed) | Cron (/cron/recrawl) |
|---|---|---|
| Scope | Single specific site | All due sites |
| Trigger | External webhook call | Scheduled timer |
| Auth | Per-site webhook secret | Global cron secret |
| Timing | Immediate (on next cron run) | Scheduled intervals |
| Use Case | Content changes, deployments | Periodic maintenance |
Best Practices
- Enroll sites first: Use WebSocket endpoint with
enableAutoUpdate: true - Set webhook secrets: Always configure secrets for production sites
- Call after deploy: Trigger webhook after content is published, not before
- Handle errors: Implement retry logic for failed webhook calls
- Monitor database: Check
next_crawl_atis updated correctly - Run cron frequently: Ensure cron job runs often enough to pick up webhook triggers
Troubleshooting
Webhook Returns 404 “Site not enrolled”
Cause: The site hasn’t been crawled with auto-update enabled. Solution: Crawl the site via WebSocket withenableAutoUpdate: true:
Webhook Returns 401 “Invalid webhook secret”
Cause: The provided secret doesn’t match the database value. Solution: Check the stored secret:Recrawl Not Happening After Webhook
Cause: Cron job not running or running infrequently. Solution:- Verify cron job is scheduled and running
- Check
next_crawl_atwas updated: - Check cron job logs for errors
Related Endpoints
- Cron Recrawl - Schedule automatic recrawls
- WebSocket Crawl - Initial crawl with auto-update enrollment