Overview
The system implements two-tier capacity management to control concurrent calls:- Global capacity limit: Maximum concurrent calls across all tenants
- Per-tenant capacity limit: Maximum concurrent calls for a single tenant
Configuration
Global Capacity Limit
Set the maximum concurrent calls globally via environment variable:src/core/settings.py:84:
Per-Tenant Capacity Limit
By default, the per-tenant limit inherits the global limit. You can configure a separate per-tenant limit if needed:How Capacity Gating Works
The capacity gating logic is implemented insrc/apps/calls/api/v1/endpoints/openai_webhook.py:237-297. Here’s how it works:
1. Calculate Current Usage
When an incoming call arrives, the system calculates:2. Check Capacity Limits
The system rejects calls when either limit is reached:3. Reserve or Reject
If capacity is available, the call is marked as pending:Capacity State Management
The system maintains three capacity-related states:Pending Calls
Calls that have been accepted but not yet started:pending_call_ids: Set of all pending call IDspending_tenant_by_call_id: Map of call_id → tenant_idpending_by_tenant: Map of tenant_id → set of call_ids
Accepted Calls
Calls that have been accepted (tracked for deduplication):Active Calls
Managed byCallManager, representing calls in active sessions.
Releasing Capacity
Pending capacity is released in these scenarios:1. Call Session Starts Successfully
Seeopenai_webhook.py:555-563:
2. Call Rejected or Fails to Start
The_release_pending_capacity_state function (lines 39-49) cleans up:
3. Call Ends
Seeopenai_webhook.py:694-696:
Monitoring Capacity
Track capacity rejections using metrics:rejected_calls_capacity: Count of calls rejected due to capacity limitsactive_calls: Current active call countaccepted_calls: Total accepted calls
Capacity Rejection Response
When a call is rejected due to capacity, the webhook returns:Best Practices
1. Set Conservative Limits
Start with lower limits and increase based on observed performance:2. Monitor Rejection Rates
High rejection rates indicate insufficient capacity:3. Implement Tenant-Specific Limits
For multi-tenant deployments, prevent single tenants from monopolizing capacity:4. Use Async Locks Properly
All capacity state modifications usecapacity_lock to prevent race conditions:
Thread Safety
The capacity gating implementation is thread-safe:- All capacity state reads/writes protected by
capacity_lock - Atomic check-and-reserve operations
- Idempotent cleanup operations
Troubleshooting
Calls Rejected Despite Low Active Count
Check pending calls:Capacity Not Released
Check for exceptions in_start_call_session that prevent cleanup. Review logs: