Minimum requirements
The absolute minimum configuration for development and testing:CPU
4 cores (8 threads recommended)
RAM
16 GB (32 GB recommended)
Storage
100 GB SSD
Network
100 Mbps symmetric
Production requirements
Single-region deployment (25 concurrent calls)
| Component | Specification |
|---|---|
| CPU | 16 cores (32 threads) |
| RAM | 64 GB ECC |
| Storage | 500 GB NVMe SSD |
| Network | 1 Gbps symmetric, <20ms latency |
| Network Interface | Dedicated interface for RTP traffic |
Multi-region deployment (100+ concurrent calls)
For horizontal scaling across multiple regions: Per Backend App instance:- CPU: 16 cores
- RAM: 32 GB
- Storage: 250 GB SSD
- Network: 1 Gbps dedicated
- MongoDB: 32 cores, 128 GB RAM, 1 TB SSD (replica set)
- Redis: 16 cores, 64 GB RAM, 250 GB SSD (cluster mode)
- Milvus: 16 cores, 64 GB RAM, 500 GB SSD
- S3 Storage: 5 TB minimum, expandable
Software dependencies
Required runtime
- .NET 10 Runtime
Version: .NET 10.0 or laterThe ASP.NET Core Runtime is required for all four services:Supported platforms:
- Frontend Dashboard
- Backend Proxy
- Backend App
- Background Processor
- Linux (x64, ARM64)
- Windows Server 2019+
- macOS (development only)
Database systems
MongoDB - Primary metadata storage
MongoDB - Primary metadata storage
Version: 6.0 or later (7.0 recommended)Purpose: Stores all application metadata including:Performance tuning:
- User accounts and authentication
- Agent configurations and scripts
- Conversation history and logs
- Integration settings
- Billing and usage data
- Replica set (minimum 3 nodes for production)
- Transactions support enabled
- WiredTiger storage engine
- Minimum 50 GB storage allocation
- Enable compression:
storage.wiredTiger.collectionConfig.blockCompressor=snappy - Set cache size:
storage.wiredTiger.engineConfig.cacheSizeGB=16
Redis - Session state and caching
Redis - Session state and caching
Version: 7.0 or laterPurpose:
- Real-time session state for active calls
- Pub/Sub for inter-service communication
- Call queue management for outbound dialing
- L1 cache for TTS audio (Backend App only)
- Redis Cluster for production (minimum 6 nodes: 3 primary + 3 replicas)
- Standalone acceptable for development
- Persistence enabled (RDB + AOF)
- Minimum 8 GB memory allocation
The Backend App requires a separate local Redis instance for TTS audio caching to minimize latency. This should run on
127.0.0.1 on the same machine.Milvus - Vector database for RAG
Milvus - Vector database for RAG
Version: 2.4.0 or laterPurpose:Ports:
- Stores embeddings for knowledge base documents
- Enables semantic search and RAG (Retrieval-Augmented Generation)
- Powers conversation memory and context retrieval
- Standalone - Single-node deployment for development/small-scale
- Distributed - Multi-node cluster for production
- Minimum 16 GB RAM (scales with vector count)
- GPU optional but recommended for large-scale deployments
- SSD storage for index files
19530- gRPC API9091- HTTP API (used by Iqra)
- Adjust collection memory limits based on embedding dimensions
- Configure index type (HNSW recommended for most use cases)
- Set appropriate
CollectionStaleTimeoutMinutesto unload unused collections
Object storage
- RustFS (Recommended)
- AWS S3
- MinIO
Purpose: S3-compatible storage for:
- Call recordings (audio files)
- TTS audio cache
- User-uploaded documents and knowledge base files
- Logo images and assets
- Minimum 500 GB storage
- 1 GB RAM per TB of storage
- Fast disk I/O for audio streaming
Operating system support
Supported platforms
- Linux (Recommended)
- Windows Server
- macOS (Development only)
Recommended distributions:
- Ubuntu 22.04 LTS or later
- Debian 12 or later
- CentOS Stream 9
- Red Hat Enterprise Linux 9
- Rocky Linux 9
- Linux kernel 5.15 or later
iptablesornftablesfor firewall- Network namespaces support
- Superior network stack for real-time audio (RTP/UDP)
- Better performance under high concurrent load
- Easier to deploy with systemd
Network requirements
Bandwidth calculation
Per concurrent call:- Audio codec (PCMU/PCMA): ~80 Kbps (40 Kbps upload + 40 Kbps download)
- Overhead (RTP/UDP headers): ~10 Kbps
- Total per call: ~90 Kbps
- 25 concurrent calls: 2.25 Mbps
- 100 concurrent calls: 9 Mbps
- 500 concurrent calls: 45 Mbps
Add 30% headroom for bursts and signaling traffic. A 25-call system should have minimum 3 Mbps symmetric bandwidth.
Port requirements
| Service | Protocol | Port | Purpose |
|---|---|---|---|
| Frontend | TCP | 5000 | Dashboard HTTP |
| Frontend | TCP | 5001 | Dashboard HTTPS |
| Backend Proxy | TCP | 5060 | SIP signaling |
| Backend Proxy | UDP | 10000-20000 | RTP audio |
| Backend App | UDP | 20000-40000 | RTP audio |
| MongoDB | TCP | 27017 | Database |
| Redis | TCP | 6379 | Cache/Queue |
| Milvus | TCP | 9091 | Vector DB API |
| RustFS/S3 | TCP | 9000 | Object storage |
Latency requirements
Recommended RTT (Round-Trip Time):- Backend to MongoDB: <5ms
- Backend to Redis: <2ms (ideally localhost)
- Backend to Milvus: <10ms
- Backend to S3: <20ms
- User to Backend (RTP): <150ms for acceptable call quality
Network interface configuration
The Backend App and Proxy bind to a specific OS network interface for RTP:Capacity planning
Concurrent call capacity
Estimating the number of concurrent calls a Backend App instance can handle:| Hardware | Concurrent Calls | Notes |
|---|---|---|
| 4 cores, 16 GB RAM | 5-10 | Development only |
| 8 cores, 32 GB RAM | 10-25 | Small production |
| 16 cores, 64 GB RAM | 25-50 | Recommended production |
| 32 cores, 128 GB RAM | 50-100 | High-volume |
Actual capacity depends on:
- AI model latency (OpenAI, Anthropic response times)
- TTS provider speed (ElevenLabs, Deepgram)
- Conversation complexity and tool usage
- Network quality and latency
Storage growth estimation
MongoDB:- Agent configuration: ~500 KB per agent
- Conversation log: ~50 KB per minute of call
- User data: ~10 KB per user
- Call recording (compressed): ~500 KB per minute
- TTS audio cache: ~20 KB per utterance (with high reuse)
- Documents (RAG): Variable, typically 1-10 MB per document
- MongoDB: ~3 GB
- S3 Recordings: ~30 GB
- TTS Cache: ~5 GB (with cache hits)
Scaling strategies
Vertical scaling
Add more CPU and RAM to existing Backend App servers (up to 100 concurrent calls per instance).
Horizontal scaling
Deploy additional Backend App instances in the same region. The Backend Proxy automatically load balances.
Multi-region deployment
Deploy separate infrastructure stacks in different geographic regions to reduce latency for global users.
Performance benchmarks
Tested on AWSc6i.4xlarge (16 vCPU, 32 GB RAM):
- Concurrent calls: 50
- Average latency (AI response): 1.2 seconds
- Average latency (TTS): 0.8 seconds
- RTP packet loss: <0.01%
- CPU utilization: 65%
- Memory utilization: 18 GB
Cloud provider recommendations
AWS
Recommended instance types:- Backend App:
c6i.4xlarge(compute-optimized) - MongoDB:
r6i.2xlarge(memory-optimized) - Redis:
r6g.xlarge(ARM, memory-optimized)
Google Cloud
Recommended instance types:- Backend App:
c2-standard-16 - MongoDB:
n2-highmem-16 - Redis:
e2-highmem-8
Azure
Recommended instance types:- Backend App:
F16s_v2 - MongoDB:
E16s_v5 - Redis:
D8s_v5
Bare metal / On-premise
Recommended specifications:- Processor: Intel Xeon Scalable (Ice Lake or newer) or AMD EPYC
- RAM: ECC DDR4-3200 or faster
- Storage: NVMe SSDs with high IOPS
- Network: 10 Gbps network cards with SR-IOV support
Security requirements
TLS/SSL certificates
Required for HTTPS and secure WebRTC connections
Firewall
iptables/nftables or cloud security groups configured
SSH hardening
Key-based authentication, disable password login
Monitoring
Prometheus, Grafana, or cloud-native monitoring
Next steps
Self-hosting guide
Follow the step-by-step installation instructions
Configuration reference
Detailed configuration options for all services