Skip to main content
This guide outlines the system requirements for running Iqra AI on your own infrastructure, including hardware specifications, software dependencies, and capacity planning recommendations.

Minimum requirements

The absolute minimum configuration for development and testing:

CPU

4 cores (8 threads recommended)

RAM

16 GB (32 GB recommended)

Storage

100 GB SSD

Network

100 Mbps symmetric
These minimum specs support up to 5 concurrent calls. For production workloads, see the production requirements below.

Production requirements

Single-region deployment (25 concurrent calls)

ComponentSpecification
CPU16 cores (32 threads)
RAM64 GB ECC
Storage500 GB NVMe SSD
Network1 Gbps symmetric, <20ms latency
Network InterfaceDedicated interface for RTP traffic

Multi-region deployment (100+ concurrent calls)

For horizontal scaling across multiple regions: Per Backend App instance:
  • CPU: 16 cores
  • RAM: 32 GB
  • Storage: 250 GB SSD
  • Network: 1 Gbps dedicated
Shared infrastructure:
  • MongoDB: 32 cores, 128 GB RAM, 1 TB SSD (replica set)
  • Redis: 16 cores, 64 GB RAM, 250 GB SSD (cluster mode)
  • Milvus: 16 cores, 64 GB RAM, 500 GB SSD
  • S3 Storage: 5 TB minimum, expandable

Software dependencies

Required runtime

Version: .NET 10.0 or laterThe ASP.NET Core Runtime is required for all four services:
  • Frontend Dashboard
  • Backend Proxy
  • Backend App
  • Background Processor
Installation:
# Ubuntu/Debian
wget https://dot.net/v1/dotnet-install.sh
chmod +x dotnet-install.sh
./dotnet-install.sh --channel 10.0 --runtime aspnetcore

# Verify installation
dotnet --version
Supported platforms:
  • Linux (x64, ARM64)
  • Windows Server 2019+
  • macOS (development only)

Database systems

Version: 6.0 or later (7.0 recommended)Purpose: Stores all application metadata including:
  • User accounts and authentication
  • Agent configurations and scripts
  • Conversation history and logs
  • Integration settings
  • Billing and usage data
Configuration requirements:
  • Replica set (minimum 3 nodes for production)
  • Transactions support enabled
  • WiredTiger storage engine
  • Minimum 50 GB storage allocation
Recommended setup:
# Install MongoDB 7.0
curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | \
  sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg --dearmor

echo "deb [ signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse" | \
  sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list

sudo apt-get update
sudo apt-get install -y mongodb-org
Performance tuning:
  • Enable compression: storage.wiredTiger.collectionConfig.blockCompressor=snappy
  • Set cache size: storage.wiredTiger.engineConfig.cacheSizeGB=16
Version: 7.0 or laterPurpose:
  • Real-time session state for active calls
  • Pub/Sub for inter-service communication
  • Call queue management for outbound dialing
  • L1 cache for TTS audio (Backend App only)
Configuration requirements:
  • Redis Cluster for production (minimum 6 nodes: 3 primary + 3 replicas)
  • Standalone acceptable for development
  • Persistence enabled (RDB + AOF)
  • Minimum 8 GB memory allocation
Memory policy:
# /etc/redis/redis.conf
maxmemory 8gb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec
The Backend App requires a separate local Redis instance for TTS audio caching to minimize latency. This should run on 127.0.0.1 on the same machine.
Version: 2.4.0 or laterPurpose:
  • Stores embeddings for knowledge base documents
  • Enables semantic search and RAG (Retrieval-Augmented Generation)
  • Powers conversation memory and context retrieval
Deployment modes:
  • Standalone - Single-node deployment for development/small-scale
  • Distributed - Multi-node cluster for production
Resource requirements:
  • Minimum 16 GB RAM (scales with vector count)
  • GPU optional but recommended for large-scale deployments
  • SSD storage for index files
Installation (Docker):
wget https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
docker-compose up -d
Ports:
  • 19530 - gRPC API
  • 9091 - HTTP API (used by Iqra)
Performance tuning:
  • Adjust collection memory limits based on embedding dimensions
  • Configure index type (HNSW recommended for most use cases)
  • Set appropriate CollectionStaleTimeoutMinutes to unload unused collections

Object storage

Operating system support

Supported platforms

Network requirements

Bandwidth calculation

Per concurrent call:
  • Audio codec (PCMU/PCMA): ~80 Kbps (40 Kbps upload + 40 Kbps download)
  • Overhead (RTP/UDP headers): ~10 Kbps
  • Total per call: ~90 Kbps
Example calculations:
  • 25 concurrent calls: 2.25 Mbps
  • 100 concurrent calls: 9 Mbps
  • 500 concurrent calls: 45 Mbps
Add 30% headroom for bursts and signaling traffic. A 25-call system should have minimum 3 Mbps symmetric bandwidth.

Port requirements

ServiceProtocolPortPurpose
FrontendTCP5000Dashboard HTTP
FrontendTCP5001Dashboard HTTPS
Backend ProxyTCP5060SIP signaling
Backend ProxyUDP10000-20000RTP audio
Backend AppUDP20000-40000RTP audio
MongoDBTCP27017Database
RedisTCP6379Cache/Queue
MilvusTCP9091Vector DB API
RustFS/S3TCP9000Object storage
Ensure your firewall allows UDP traffic on the RTP port ranges. Blocking UDP will prevent all audio from flowing.

Latency requirements

Recommended RTT (Round-Trip Time):
  • Backend to MongoDB: <5ms
  • Backend to Redis: <2ms (ideally localhost)
  • Backend to Milvus: <10ms
  • Backend to S3: <20ms
  • User to Backend (RTP): <150ms for acceptable call quality

Network interface configuration

The Backend App and Proxy bind to a specific OS network interface for RTP:
# Find your interface name
ip addr
# Common names: eth0, ens5, ens160

# Test interface is active
ping -I eth0 8.8.8.8

Capacity planning

Concurrent call capacity

Estimating the number of concurrent calls a Backend App instance can handle:
HardwareConcurrent CallsNotes
4 cores, 16 GB RAM5-10Development only
8 cores, 32 GB RAM10-25Small production
16 cores, 64 GB RAM25-50Recommended production
32 cores, 128 GB RAM50-100High-volume
Actual capacity depends on:
  • AI model latency (OpenAI, Anthropic response times)
  • TTS provider speed (ElevenLabs, Deepgram)
  • Conversation complexity and tool usage
  • Network quality and latency

Storage growth estimation

MongoDB:
  • Agent configuration: ~500 KB per agent
  • Conversation log: ~50 KB per minute of call
  • User data: ~10 KB per user
S3 Storage:
  • Call recording (compressed): ~500 KB per minute
  • TTS audio cache: ~20 KB per utterance (with high reuse)
  • Documents (RAG): Variable, typically 1-10 MB per document
Example: 1000 hours of calls per month
  • MongoDB: ~3 GB
  • S3 Recordings: ~30 GB
  • TTS Cache: ~5 GB (with cache hits)

Scaling strategies

1

Vertical scaling

Add more CPU and RAM to existing Backend App servers (up to 100 concurrent calls per instance).
2

Horizontal scaling

Deploy additional Backend App instances in the same region. The Backend Proxy automatically load balances.
3

Multi-region deployment

Deploy separate infrastructure stacks in different geographic regions to reduce latency for global users.
4

Database scaling

Convert MongoDB to replica set, then eventually to sharded cluster for massive scale (1M+ agents).

Performance benchmarks

Tested on AWS c6i.4xlarge (16 vCPU, 32 GB RAM):
  • Concurrent calls: 50
  • Average latency (AI response): 1.2 seconds
  • Average latency (TTS): 0.8 seconds
  • RTP packet loss: <0.01%
  • CPU utilization: 65%
  • Memory utilization: 18 GB

Cloud provider recommendations

AWS

Recommended instance types:
  • Backend App: c6i.4xlarge (compute-optimized)
  • MongoDB: r6i.2xlarge (memory-optimized)
  • Redis: r6g.xlarge (ARM, memory-optimized)

Google Cloud

Recommended instance types:
  • Backend App: c2-standard-16
  • MongoDB: n2-highmem-16
  • Redis: e2-highmem-8

Azure

Recommended instance types:
  • Backend App: F16s_v2
  • MongoDB: E16s_v5
  • Redis: D8s_v5

Bare metal / On-premise

Recommended specifications:
  • Processor: Intel Xeon Scalable (Ice Lake or newer) or AMD EPYC
  • RAM: ECC DDR4-3200 or faster
  • Storage: NVMe SSDs with high IOPS
  • Network: 10 Gbps network cards with SR-IOV support

Security requirements

TLS/SSL certificates

Required for HTTPS and secure WebRTC connections

Firewall

iptables/nftables or cloud security groups configured

SSH hardening

Key-based authentication, disable password login

Monitoring

Prometheus, Grafana, or cloud-native monitoring

Next steps

Self-hosting guide

Follow the step-by-step installation instructions

Configuration reference

Detailed configuration options for all services

Build docs developers (and LLMs) love