Performance Overview
Cerbos is designed for high-performance authorization with typical latencies:- P50 latency: < 5ms for cached policy evaluations
- P95 latency: < 20ms for most workloads
- P99 latency: < 50ms with warm cache
- Throughput: 10,000+ requests/second per instance
Actual performance depends on policy complexity, storage backend, caching configuration, and hardware resources.
Caching Strategy
Cerbos maintains an in-memory cache of compiled policies for fast evaluation.Policy Cache Metrics
Monitor cache effectiveness:Cache Warming
Cerbos automatically caches policies on first access. For predictable performance:- Pre-load common policies after startup
- Monitor cache miss rate during deployment
- Ensure sufficient cache capacity for all active policies
Policy Compilation Performance
Policy compilation happens once per policy update:- Keep policies modular and focused
- Avoid overly complex condition expressions
- Use derived roles to reduce duplication
Request Batching
Batch multiple authorization checks in a single request to reduce network overhead.Batch Size Configuration
Batch Size Monitoring
Batching Best Practices
Optimal Batch Sizes
Optimal Batch Sizes
Small Batches (1-10 resources):
- Lower latency per decision
- Better for real-time UI authorization
- Use for critical path operations
- Balanced latency and throughput
- Ideal for list views, bulk operations
- Good default for most applications
- Maximum throughput
- Higher total latency
- Use for background processing, reports
Client-Side Batching Strategy
Client-Side Batching Strategy
Adaptive Batching
Adaptive Batching
Dynamically adjust batch size based on latency:
gRPC Configuration
Connection Settings
| Parameter | Default | Recommended | Impact |
|---|---|---|---|
maxConcurrentStreams | 100 | 100-500 | Concurrent requests per connection |
connectionTimeout | 120s | 60-120s | Time to establish connection |
maxConnectionAge | - | 300s | Force connection refresh |
maxRecvMsgSizeBytes | 4MB | 4-8MB | Maximum request size |
Client-Side Connection Pooling
HTTP Configuration
readHeaderTimeout: Should be < connection establishment timereadTimeout: Must accommodate largest request processingwriteTimeout: Must accommodate largest responseidleTimeout: Balance connection reuse vs resource consumption
Storage Backend Optimization
Git Storage Performance
- Use shallow clones for large repositories
- Minimize policy file count via consolidation
- Use local caching proxies for remote repositories
Database Storage Performance
- Small deployment: maxOpen=10, maxIdle=5
- Medium deployment: maxOpen=25, maxIdle=10
- Large deployment: maxOpen=50, maxIdle=20
Blob Storage Performance
- Use regional endpoints to reduce latency
- Enable CloudFront or CDN for global deployments
- Implement retry logic with exponential backoff
Resource Limits
Request Size Limits
Memory Management
Estimate memory requirements:- 500 policies
- 1000 req/s
- Audit enabled
Container Resource Limits
Kubernetes Resources
| Workload | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| Small (< 100 req/s) | 100m | 500m | 256Mi | 512Mi |
| Medium (100-1000 req/s) | 250m | 2000m | 512Mi | 2Gi |
| Large (> 1000 req/s) | 1000m | 4000m | 1Gi | 4Gi |
Horizontal Pod Autoscaling
Network Optimization
Keep-Alive Configuration
Enable TCP keep-alive to detect and recycle stale connections:Connection Reuse
Clients should reuse connections:Latency Optimization
Target Latency SLOs
Latency Troubleshooting
High P50 Latency (> 10ms)
High P50 Latency (> 10ms)
Possible Causes:
- Cache misses (check
cerbos_dev_cache_access_count) - Complex policy conditions
- Slow storage backend
- Warm cache during startup
- Simplify policy logic
- Optimize storage backend (see Storage section)
- Ensure adequate CPU allocation
High P95/P99 Latency
High P95/P99 Latency
Possible Causes:
- Occasional cache evictions
- GC pauses
- Network congestion
- Resource contention
- Increase memory allocation
- Monitor GC metrics:
go_gc_duration_seconds - Check network latency to storage
- Review resource limits
Increasing Latency Over Time
Increasing Latency Over Time
Possible Causes:
- Memory pressure
- Growing policy count
- Storage performance degradation
- Monitor memory usage trends
- Implement policy archival strategy
- Investigate storage backend health
- Check for connection leaks
Policy Design for Performance
Efficient Policy Patterns
Good - Simple conditions:Derived Roles Optimization
Use derived roles to reduce policy duplication:Benchmarking
Load Testing
Useghz for gRPC load testing:
Performance Testing Checklist
- Test with production-like policy count
- Use realistic batch sizes
- Warm cache before benchmarking
- Test with concurrent clients
- Monitor resource usage during test
- Measure P50, P95, P99 latencies
- Verify cache hit rates
- Test storage backend separately