Scaling - Ant Media Server

Ant Media Server Enterprise Edition supports clustering to scale horizontally across multiple servers. This guide covers cluster architecture, configuration, and best practices for scaling your streaming infrastructure.

Cluster Architecture

Components

Origin Servers

Handle stream publishing
Process incoming RTMP, WebRTC, SRT streams
Perform transcoding and adaptive bitrate processing
Store stream data

Edge Servers

Handle stream playback/distribution
Serve WebRTC, HLS, DASH viewers
Pull streams from origin servers
Scale independently based on viewer demand

Database

MongoDB (recommended for cluster mode)
Stores stream metadata
Coordinates cluster nodes
Maintains application settings

Load Balancer

Distributes incoming requests
Health checking
SSL termination
Session affinity for WebRTC

Cluster Node Management

Node Registration

Cluster nodes automatically register themselves (src/main/java/io/antmedia/cluster/ClusterNode.java):

public class ClusterNode {
  private String id;           // Unique node identifier
  private String ip;           // Node IP address
  private long lastUpdateTime; // Last heartbeat timestamp
  private String memory;       // Memory usage
  private String cpu;          // CPU usage
  private int dbQueryAverageTimeMs; // Database performance
}

Node Status

Nodes report status every 5 seconds (NODE_UPDATE_PERIOD):

ALIVE: Last update within 20 seconds (4 × NODE_UPDATE_PERIOD)
DEAD: No update for more than 20 seconds

Check Cluster Nodes

# Get all cluster nodes
curl -X GET "http://localhost:5080/rest/v2/cluster-nodes"

# Get specific node
curl -X GET "http://localhost:5080/rest/v2/cluster-nodes/{nodeId}"

# Get node count
curl -X GET "http://localhost:5080/rest/v2/cluster-nodes/count"

Cluster Configuration

MongoDB Setup

Configure MongoDB for cluster coordination:

# Install MongoDB
sudo apt-get install -y mongodb-org

# Start MongoDB
sudo systemctl start mongod
sudo systemctl enable mongod

# Create database and user
mongo
> use antmedia
> db.createUser({
    user: "antmedia",
    pwd: "strong_password",
    roles: [{role: "readWrite", db: "antmedia"}]
  })

Configure Ant Media Server

Edit <AMS-DIR>/webapps/<App-Name>/WEB-INF/red5-web.properties:

# Database configuration
db.type=mongodb
db.host=mongodb://antmedia:strong_password@mongodb-server:27017/antmedia

# Cluster mode
clusterMode=true

Server Settings for Clustering

Configure in conf/red5.properties:

# Use global IP for cluster communication
useGlobalIp=true

# Node group for organizing cluster
nodeGroup=default

# Server name/hostname
server.name=origin-01.example.com

Node Groups

Organize cluster nodes into groups for better management:

Purpose

Organize nodes by region/data center
Separate origin and edge nodes
Route streams within node groups
Improve latency by keeping streams local

Configure Node Groups

# In red5.properties
nodeGroup=us-east

Nodes in the same group are preferred for stream routing.

Load Balancing

Origin Server Load Balancing

Requirements:

Session persistence for WebRTC publishing
Health checks on port 5080
Support for WebSocket connections

Example NGINX Configuration:

upstream ams_origin {
    least_conn;  # Use least connections algorithm
    server origin-01.example.com:5080 max_fails=3 fail_timeout=30s;
    server origin-02.example.com:5080 max_fails=3 fail_timeout=30s;
    server origin-03.example.com:5080 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl http2;
    server_name publish.example.com;
    
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    location / {
        proxy_pass http://ams_origin;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # WebRTC requires session persistence
        proxy_read_timeout 86400s;
        proxy_send_timeout 86400s;
    }
}

Edge Server Load Balancing

Requirements:

Round-robin or least connections
Health checks
No session persistence required (for HLS/DASH)
Session persistence for WebRTC playback

Example NGINX Configuration:

upstream ams_edge {
    least_conn;
    server edge-01.example.com:5080 max_fails=3 fail_timeout=30s;
    server edge-02.example.com:5080 max_fails=3 fail_timeout=30s;
    server edge-03.example.com:5080 max_fails=3 fail_timeout=30s;
    server edge-04.example.com:5080 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl http2;
    server_name play.example.com;
    
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    # WebRTC playback (needs session persistence)
    location ~ ^/LiveApp/websocket {
        proxy_pass http://ams_edge;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        
        ip_hash;  # Session persistence
    }
    
    # HLS/DASH playback (no session persistence needed)
    location ~ ^/LiveApp/streams/ {
        proxy_pass http://ams_edge;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        
        # Enable caching for segments
        proxy_cache media_cache;
        proxy_cache_valid 200 1s;
    }
}

Scaling Strategies

Vertical Scaling

Increase resources on existing servers:

CPU: More cores for encoding/transcoding
Memory: Support more concurrent streams
GPU: Hardware encoding for higher throughput
Network: Higher bandwidth for more viewers

When to Use:

Small to medium deployments
Cost-effective up to a point
Simpler management

Horizontal Scaling

Add more servers to the cluster:

Origin Scaling: Add origins for more publishers
Edge Scaling: Add edges for more viewers
Independent Scaling: Scale origins and edges separately

When to Use:

Large deployments
Geographic distribution
High availability requirements
Better fault tolerance

Auto-Scaling

Implement auto-scaling based on metrics:

Metrics to Monitor

For Origin Scaling:

CPU usage > 75%
Active publishers approaching limit
Encoder queue depth
Memory usage

For Edge Scaling:

CPU usage > 70%
Active viewers approaching limit
Network bandwidth utilization
HLS viewer count

Auto-Scaling Implementation

#!/bin/bash
# Example auto-scaling script

# Get current CPU usage
CPU_USAGE=$(curl -s http://localhost:5080/rest/v2/system-resources-info | \
  jq -r '.cpuUsage.systemCPULoad')

# Get viewer count
VIEWER_COUNT=$(curl -s http://localhost:5080/rest/v2/system-resources-info | \
  jq -r '(.localWebRTCViewers + .localHLSViewers + .localDASHViewers)')

VIEWER_LIMIT=1000

if [ "$CPU_USAGE" -gt 75 ] || [ "$VIEWER_COUNT" -gt "$VIEWER_LIMIT" ]; then
  echo "Scaling up: CPU=${CPU_USAGE}% Viewers=${VIEWER_COUNT}"
  # Trigger cloud provider to add instance
  # aws autoscaling set-desired-capacity ...
  # gcloud compute instance-groups managed resize ...
fi

Cloud Provider Auto-Scaling

AWS Auto Scaling Group:

# Create launch template with AMS pre-installed
# Configure auto-scaling based on CloudWatch metrics

aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name ams-edge-asg \
  --launch-template LaunchTemplateName=ams-edge \
  --min-size 2 \
  --max-size 10 \
  --desired-capacity 2 \
  --target-group-arns arn:aws:elasticloadbalancing:...

# Add scaling policies
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name ams-edge-asg \
  --policy-name scale-up-cpu \
  --scaling-adjustment 1 \
  --adjustment-type ChangeInCapacity \
  --cooldown 300

GCP Managed Instance Group:

# Create instance template
gcloud compute instance-templates create ams-edge-template \
  --machine-type=n1-standard-4 \
  --image-family=ubuntu-2004-lts \
  --startup-script-from-file=install-ams.sh

# Create managed instance group with auto-scaling
gcloud compute instance-groups managed create ams-edge-mig \
  --base-instance-name=ams-edge \
  --template=ams-edge-template \
  --size=2 \
  --zone=us-central1-a

gcloud compute instance-groups managed set-autoscaling ams-edge-mig \
  --max-num-replicas=10 \
  --min-num-replicas=2 \
  --target-cpu-utilization=0.70 \
  --cool-down-period=300

Resource Limits and Capacity Planning

CPU Limits

The StatsCollector monitors CPU usage (src/main/java/io/antmedia/statistic/StatsCollector.java:75):

# Configure CPU limit (10-100%)
server.cpu_limit=75

When exceeded, server rejects new streams to prevent overload.

Memory Limits

# Memory limit percentage (10-100%)
server.memory_limit=75

Capacity Estimates

Origin Server (8 vCPU, 16GB RAM):

~50-100 concurrent publishers (WebRTC)
~200-500 concurrent publishers (RTMP, no transcoding)
Depends on: resolution, bitrate, transcoding profiles

Edge Server (8 vCPU, 16GB RAM, 1Gbps network):

~2,000-5,000 HLS viewers
~500-1,000 WebRTC viewers
~100-200 DASH viewers
Depends on: bitrate, protocols, ABR profiles

High Availability

Database High Availability

Use MongoDB replica set:

# Configure replica set
mongod --replSet rs0 --bind_ip localhost,mongodb-01
mongod --replSet rs0 --bind_ip localhost,mongodb-02
mongod --replSet rs0 --bind_ip localhost,mongodb-03

# Initialize replica set
mongo
> rs.initiate({
    _id: "rs0",
    members: [
      {_id: 0, host: "mongodb-01:27017"},
      {_id: 1, host: "mongodb-02:27017"},
      {_id: 2, host: "mongodb-03:27017"}
    ]
  })

Update connection string:

db.host=mongodb://antmedia:password@mongodb-01:27017,mongodb-02:27017,mongodb-03:27017/antmedia?replicaSet=rs0

Load Balancer High Availability

Use multiple load balancers with failover
DNS round-robin between load balancers
Cloud provider managed load balancers (AWS ALB, GCP Load Balancing)
Keepalived + HAProxy for self-hosted

Multi-Region Deployment

Deploy clusters in multiple regions:

Region 1 (US-East)          Region 2 (EU-West)
├── Origin Servers          ├── Origin Servers
├── Edge Servers            ├── Edge Servers
├── MongoDB                 ├── MongoDB
└── Load Balancer          └── Load Balancer
         ↓                           ↓
      Global DNS/CDN (GeoDNS Routing)

Performance Optimization

Database Performance

Monitor database query times (src/main/java/io/antmedia/cluster/ClusterNode.java:28):

# Check database performance per app
curl -s http://localhost:5080/rest/v2/system-resources-info | \
  jq '.dbAverageQueryTimeMs'

Optimizations:

Add database indexes
Use faster storage (SSD)
Increase database resources
Use read replicas

Network Optimization

Use CDN for HLS/DASH delivery
Enable QUIC/HTTP3 for lower latency
Optimize MTU settings
Use dedicated network for cluster communication

Monitoring Cluster Health

#!/bin/bash
# Cluster health check script

echo "=== Cluster Node Status ==="
curl -s http://localhost:5080/rest/v2/cluster-nodes | \
  jq -r '.[] | "\(.id): \(.status) (CPU: \(.cpu), Memory: \(.memory))"'

echo ""
echo "=== Total Streams and Viewers ==="
curl -s http://localhost:5080/rest/v2/system-resources-info | \
  jq '{streams: .totalLiveStreamSize, webrtc: .localWebRTCViewers, hls: .localHLSViewers}'

echo ""
echo "=== Database Performance ==="
curl -s http://localhost:5080/rest/v2/system-resources-info | \
  jq '{avgQueryTime: .dbAverageQueryTimeMs}'

Best Practices

Separate Origins and Edges: Use dedicated servers for publishing vs playback
Monitor Node Health: Track CPU, memory, and database performance
Use Node Groups: Organize nodes by region/function
Database HA: Always use MongoDB replica set in production
Load Balancer HA: Use redundant load balancers
Auto-Scaling: Implement automated scaling based on metrics
Capacity Planning: Plan for peak load + 20-30% headroom
Regular Testing: Test failover scenarios regularly
Resource Limits: Set appropriate CPU/memory limits
Geographic Distribution: Deploy close to users for lower latency

Get Started

Core Concepts

Deployment

Configuration

Streaming

Features

Security

Storage

SDKs

Operations

​Cluster Architecture

​Components

​Cluster Node Management

​Node Registration

​Node Status

​Check Cluster Nodes

​Cluster Configuration

​MongoDB Setup

​Configure Ant Media Server

​Server Settings for Clustering

​Node Groups

​Purpose

​Configure Node Groups

​Load Balancing

​Origin Server Load Balancing

​Edge Server Load Balancing

​Scaling Strategies

​Vertical Scaling

​Horizontal Scaling

​Auto-Scaling

​Metrics to Monitor

​Auto-Scaling Implementation

​Cloud Provider Auto-Scaling

​Resource Limits and Capacity Planning

​CPU Limits

​Memory Limits

​Capacity Estimates

​High Availability

​Database High Availability

​Load Balancer High Availability

​Multi-Region Deployment

​Performance Optimization

​Database Performance

​Network Optimization

​Monitoring Cluster Health

​Best Practices

Build docs developers (and LLMs) love