Cluster Architecture
Components
Origin Servers- Handle stream publishing
- Process incoming RTMP, WebRTC, SRT streams
- Perform transcoding and adaptive bitrate processing
- Store stream data
- Handle stream playback/distribution
- Serve WebRTC, HLS, DASH viewers
- Pull streams from origin servers
- Scale independently based on viewer demand
- MongoDB (recommended for cluster mode)
- Stores stream metadata
- Coordinates cluster nodes
- Maintains application settings
- Distributes incoming requests
- Health checking
- SSL termination
- Session affinity for WebRTC
Cluster Node Management
Node Registration
Cluster nodes automatically register themselves (src/main/java/io/antmedia/cluster/ClusterNode.java):Node Status
Nodes report status every 5 seconds (NODE_UPDATE_PERIOD):- ALIVE: Last update within 20 seconds (4 × NODE_UPDATE_PERIOD)
- DEAD: No update for more than 20 seconds
Check Cluster Nodes
Cluster Configuration
MongoDB Setup
Configure MongoDB for cluster coordination:Configure Ant Media Server
Edit<AMS-DIR>/webapps/<App-Name>/WEB-INF/red5-web.properties:
Server Settings for Clustering
Configure inconf/red5.properties:
Node Groups
Organize cluster nodes into groups for better management:Purpose
- Organize nodes by region/data center
- Separate origin and edge nodes
- Route streams within node groups
- Improve latency by keeping streams local
Configure Node Groups
Load Balancing
Origin Server Load Balancing
Requirements:- Session persistence for WebRTC publishing
- Health checks on port 5080
- Support for WebSocket connections
Edge Server Load Balancing
Requirements:- Round-robin or least connections
- Health checks
- No session persistence required (for HLS/DASH)
- Session persistence for WebRTC playback
Scaling Strategies
Vertical Scaling
Increase resources on existing servers:- CPU: More cores for encoding/transcoding
- Memory: Support more concurrent streams
- GPU: Hardware encoding for higher throughput
- Network: Higher bandwidth for more viewers
- Small to medium deployments
- Cost-effective up to a point
- Simpler management
Horizontal Scaling
Add more servers to the cluster:- Origin Scaling: Add origins for more publishers
- Edge Scaling: Add edges for more viewers
- Independent Scaling: Scale origins and edges separately
- Large deployments
- Geographic distribution
- High availability requirements
- Better fault tolerance
Auto-Scaling
Implement auto-scaling based on metrics:Metrics to Monitor
For Origin Scaling:- CPU usage > 75%
- Active publishers approaching limit
- Encoder queue depth
- Memory usage
- CPU usage > 70%
- Active viewers approaching limit
- Network bandwidth utilization
- HLS viewer count
Auto-Scaling Implementation
Cloud Provider Auto-Scaling
AWS Auto Scaling Group:Resource Limits and Capacity Planning
CPU Limits
The StatsCollector monitors CPU usage (src/main/java/io/antmedia/statistic/StatsCollector.java:75):Memory Limits
Capacity Estimates
Origin Server (8 vCPU, 16GB RAM):- ~50-100 concurrent publishers (WebRTC)
- ~200-500 concurrent publishers (RTMP, no transcoding)
- Depends on: resolution, bitrate, transcoding profiles
- ~2,000-5,000 HLS viewers
- ~500-1,000 WebRTC viewers
- ~100-200 DASH viewers
- Depends on: bitrate, protocols, ABR profiles
High Availability
Database High Availability
Use MongoDB replica set:Load Balancer High Availability
- Use multiple load balancers with failover
- DNS round-robin between load balancers
- Cloud provider managed load balancers (AWS ALB, GCP Load Balancing)
- Keepalived + HAProxy for self-hosted
Multi-Region Deployment
Deploy clusters in multiple regions:Performance Optimization
Database Performance
Monitor database query times (src/main/java/io/antmedia/cluster/ClusterNode.java:28):- Add database indexes
- Use faster storage (SSD)
- Increase database resources
- Use read replicas
Network Optimization
- Use CDN for HLS/DASH delivery
- Enable QUIC/HTTP3 for lower latency
- Optimize MTU settings
- Use dedicated network for cluster communication
Monitoring Cluster Health
Best Practices
- Separate Origins and Edges: Use dedicated servers for publishing vs playback
- Monitor Node Health: Track CPU, memory, and database performance
- Use Node Groups: Organize nodes by region/function
- Database HA: Always use MongoDB replica set in production
- Load Balancer HA: Use redundant load balancers
- Auto-Scaling: Implement automated scaling based on metrics
- Capacity Planning: Plan for peak load + 20-30% headroom
- Regular Testing: Test failover scenarios regularly
- Resource Limits: Set appropriate CPU/memory limits
- Geographic Distribution: Deploy close to users for lower latency
