Rate Limits
Scribe Backend currently has soft rate limits to ensure fair usage and system stability. This page documents current limits and best practices for high-volume usage.Current Rate Limits
Scribe Backend does not enforce hard rate limits at the infrastructure level. However, operational limits exist to protect system resources.
Batch Operations
Maximum number of items per
POST /api/queue/batch requestCelery Worker Concurrency
Sequential Processing: The Celery worker runs withconcurrency=1 to:
- Prevent API rate limits on external services (Exa, ArXiv, Anthropic)
- Optimize memory usage on resource-constrained hardware (Raspberry Pi)
- Ensure FIFO queue processing
External API Limits
Scribe Backend interacts with external APIs that have their own rate limits:| Service | Rate Limit | Impact |
|---|---|---|
| Anthropic Claude | Tier-dependent | LLM calls in template_parser and email_composer |
| Exa Search | 1000 requests/month (free tier) | Web scraping in web_scraper step |
| ArXiv API | 1 request/3 seconds | Academic paper fetching in arxiv_helper |
The pipeline implements automatic retry logic with exponential backoff for external API rate limits.
Rate Limit Responses
429 Too Many Requests
When rate limits are exceeded, the API returns:Best Practices
Batch Submissions
Split large batches into chunks
Split large batches into chunks
If you need to process more than 100 emails:
Monitor queue status before submitting more
Monitor queue status before submitting more
Check queue backlog before submitting additional batches:
Use exponential backoff for errors
Use exponential backoff for errors
Implement exponential backoff for 429 and 5xx errors:
Polling
Use appropriate polling intervals
Use appropriate polling intervals
Recommended intervals:
- Queue status: Poll every 2 seconds while items are pending/processing
- Task status: Poll every 2 seconds for first 30 seconds, then 5 seconds
- Stop polling after task completes or expires (1 hour)
Stop polling when complete
Stop polling when complete
Always terminate polling loops when tasks complete:
Optimizing for High Volume
Cache email templates
Store frequently-used templates in your database instead of regenerating
Batch by template
Group recipients by template to minimize context switching
Monitor queue depth
Track queue length and adjust submission rate accordingly
Use webhooks (future)
Avoid polling by implementing webhook notifications (planned feature)
Processing Time Estimates
Typical pipeline execution times per email:| Template Type | Average Time | Notes |
|---|---|---|
| GENERAL | ~8-9 seconds | Fastest (skips ArXiv step) |
| BOOK | ~9-10 seconds | Skips ArXiv step |
| RESEARCH | ~10-12 seconds | Includes ArXiv paper fetching |
Processing time varies based on:
- Template complexity
- Web scraping content length
- External API response times
- LLM generation speed
Future Rate Limit Plans
The following features are planned for future releases:
Tiered Rate Limits
- Free Tier: 100 emails/day, 10 concurrent queue items
- Pro Tier: 1000 emails/day, 50 concurrent queue items
- Enterprise Tier: Unlimited emails, dedicated worker
Rate Limit Headers
Future API responses will include rate limit headers:Webhook Notifications
Eliminate polling with webhook support:Monitoring Rate Limits
Track your usage with these queries:Daily Email Count
Active Queue Items
PENDING and PROCESSING items to gauge current load.
Queue System
Learn about the database-backed queue architecture
Real-Time Updates
Optimize polling strategies for queue status
