Overview
Helicone’s rate limiting feature allows you to control API usage by enforcing request or cost-based quotas. Set limits per user, organization, custom property, or globally to prevent overuse and manage budgets.Rate limiting is ideal for:
- Preventing individual users from exceeding quotas
- Managing costs across teams or departments
- Enforcing fair usage in multi-tenant applications
- Protecting against runaway API consumption
Key Benefits
Flexible Policies
Rate limit by requests, cost (in cents), time windows, and custom segments
Fine-Grained Control
Apply limits per user, property, or globally across your entire organization
Cost Management
Set budget limits in cents to prevent unexpected spending
Instant Feedback
Users receive immediate 429 responses when limits are exceeded with retry information
Quick Start
Apply rate limiting by adding theHelicone-RateLimit-Policy header:
- TypeScript
- Python
- cURL
Policy Format
TheHelicone-RateLimit-Policy header uses this format:
Required Parameters
- quota: Maximum number of requests or cents allowed
- w: Time window in seconds (minimum: 60, maximum: 31536000)
Optional Parameters
- u: Unit of measurement (
requestorcents, default:request) - s: Segmentation type (
user, custom property name, or omit for global)
Policy Examples
Request-Based Limits
Cost-Based Limits
Custom Property Segments
Segmentation Types
- Global (Default)
- Per User
- Custom Property
Rate limit applies across all requests:
Response Headers
When rate limiting is active, Helicone adds headers to every response:| Header | Description | Example |
|---|---|---|
X-RateLimit-Limit | Maximum quota for the time window | 100 |
X-RateLimit-Remaining | Remaining quota in current window | 87 |
X-RateLimit-Reset | Unix timestamp when window resets | 1678901234 |
X-RateLimit-Policy | Active policy string | 100;w=60;s=user |
Rate Limit Exceeded (429 Response)
When limits are exceeded:Common Time Windows
| Period | Seconds | Example Policy |
|---|---|---|
| 1 minute | 60 | 100;w=60 |
| 5 minutes | 300 | 500;w=300 |
| 1 hour | 3600 | 1000;w=3600 |
| 1 day | 86400 | 10000;w=86400 |
| 1 week | 604800 | 50000;w=604800 |
| 1 month (30 days) | 2592000 | 200000;w=2592000 |
Advanced Use Cases
Multi-tier rate limits
Multi-tier rate limits
Implement different limits for different user tiers:
Departmental budget controls
Departmental budget controls
Set cost limits per department:
Graceful degradation
Graceful degradation
Handle rate limits with fallback logic:
Testing with small budgets
Testing with small budgets
Use decimal values for testing cost limits:
Best Practices
- Start conservative: Begin with lower limits and increase based on usage patterns
- Monitor metrics: Track rate limit hits in your Helicone dashboard
- Implement retry logic: Handle 429 responses gracefully with exponential backoff
- Use appropriate segments: Choose user, property, or global based on your use case
- Set realistic windows: Align time windows with your application’s usage patterns
- Combine with caching: Use caching to reduce requests and stay under limits
Limitations
- Minimum time window: 60 seconds
- Maximum time window: 31,536,000 seconds (1 year)
- Policy validation happens on every request
- Cost-based limits use Helicone’s cost calculations
Related Features
Caching
Reduce requests with intelligent caching
Webhooks
Get notified when limits are exceeded
