Monitoring - Ant Media Server

Ant Media Server provides comprehensive monitoring capabilities to track system resources, streaming metrics, and overall health. This guide covers monitoring tools, metrics collection, and best practices.

System Resource Monitoring

REST API Endpoints

Access system resource information through the REST API:

# Get comprehensive system resource info
curl -X GET "http://localhost:5080/rest/v2/system-resources-info"

# Get server uptime and start time
curl -X GET "http://localhost:5080/rest/v2/server-time"

# Get GPU information
curl -X GET "http://localhost:5080/rest/v2/gpu-info"

# Get version information
curl -X GET "http://localhost:5080/rest/v2/version"

Key Metrics Available

The system resource endpoint provides detailed metrics: CPU Metrics

System CPU load percentage
Process CPU load percentage
Process CPU time
System load average (last minute)

Memory Metrics

JVM memory (max, total, free, in-use)
System memory (total, free, in-use)
Native memory usage
Swap space (total, free, in-use)
Available memory

Storage Metrics

Total disk space
Free disk space
Usable disk space
In-use disk space

GPU Metrics (if available)

GPU utilization percentage
GPU memory utilization
GPU memory total/free/used
Encoder/decoder utilization
Device name and index

Thread Metrics

Thread count
Peak thread count
Dead-locked threads
Thread dumps with detailed state

Streaming Metrics

Total live streams
Local WebRTC live streams
Local WebRTC viewers
Local HLS viewers
Local DASH viewers
Database average query time

Monitoring Configuration

Configure monitoring settings in conf/red5.properties:

# CPU measurement period in milliseconds (default: 1000)
server.cpu_measurement_period_ms=1000

# CPU measurement window size for averaging (default: 5)
server.cpu_measurement_window_size=5

# Enable heartbeat monitoring (default: true)
server.heartbeatEnabled=true

# CPU limit percentage (default: 75)
server.cpu_limit=75

# Memory limit percentage (default: 75)
server.memory_limit=75

Resource Limits

Ant Media Server monitors resource usage and can reject new streams when limits are exceeded:

CPU Limits

The server calculates average CPU load over a configurable window:

Default CPU limit: 75%
Configurable range: 10% - 100%
Measurement: Rolling average over last 5 measurements (default)

Memory Limits

Memory monitoring uses different strategies based on OS: Linux Systems

Uses memory percentage limit (default: 75%)
Monitors available system memory

Other Systems

Uses minimum free RAM size (default: 50MB)
Checks absolute free memory amount

Checking Resource Availability

The StatsCollector provides the enoughResource() method to check if the server can accept new streams (src/main/java/io/antmedia/statistic/StatsCollector.java:911):

// Returns true if CPU and memory are within limits
boolean canAccept = statsCollector.enoughResource();

Kafka Integration

Stream metrics to Kafka for external monitoring and analytics:

Configure Kafka Brokers

# Set Kafka brokers in application configuration
kafka.brokers=localhost:9092

Available Kafka Topics

ams-instance-stats

System resource metrics
Published every 15 seconds (default)
Contains all system metrics in JSON format

ams-webrtc-stats

WebRTC client statistics
Per-stream and per-client metrics
Measured bitrate, send bitrate
Audio/video frame send periods
Packet counts

Example Kafka Consumer

# Consume instance stats
kafka-console-consumer.sh --bootstrap-server localhost:9092 \
  --topic ams-instance-stats --from-beginning

# Consume WebRTC stats
kafka-console-consumer.sh --bootstrap-server localhost:9092 \
  --topic ams-webrtc-stats --from-beginning

GPU Monitoring

For systems with NVIDIA GPUs, Ant Media Server provides detailed GPU metrics:

GPU Metrics Available

{
  "index": 0,
  "gpuUtilization": 45,
  "memoryUtilization": 60,
  "encoderUtilization": 80,
  "decoderUtilization": 20,
  "memoryTotal": 8589934592,
  "memoryFree": 3435973836,
  "memoryUsed": 5153960756,
  "deviceName": "NVIDIA GeForce RTX 3080"
}

Check GPU Status

# Get GPU information
curl -X GET "http://localhost:5080/rest/v2/gpu-info"

# Command line GPU monitoring
nvidia-smi -l 1  # Update every second

WebRTC Client Statistics

Monitor individual WebRTC client performance:

Client Metrics

measuredBitrate: Actual measured bitrate
sendBitrate: Target send bitrate
videoFrameSendPeriod: Video frame transmission rate
audioFrameSendPeriod: Audio frame transmission rate
videoPacketCount: Total video packets sent
audioPacketCount: Total audio packets sent
clientId: Unique client identifier
clientInfo: User agent or client information
clientIp: Client IP address

Access Client Stats

# Get WebRTC stats for a specific stream
curl -X GET "http://localhost:5080/LiveApp/rest/v2/broadcasts/{streamId}/webrtc-client-stats"

Thread Monitoring

Monitor thread health and detect deadlocks:

Thread Metrics

# Get thread dump (available in system resources)
curl -X GET "http://localhost:5080/rest/v2/system-resources-info" | jq '.threadInfo'

Thread information includes:

Thread count and peak count
Dead-locked thread IDs
Thread state (RUNNABLE, WAITING, etc.)
Blocked time and count
CPU time per thread
Lock information

Vertx Worker Queue Monitoring

Monitor Vertx worker thread queues to detect processing bottlenecks:

vertx-worker-thread-queue-size: Main Vertx worker queue
webrtc-vertx-worker-thread-queue-size: WebRTC Vertx worker queue

High queue sizes indicate processing delays.

Webhooks for Monitoring

Configure webhooks to receive notifications for critical events:

# Set webhook URL in red5.properties
server.statusWebHookURL=https://your-monitoring-system.com/webhook

Webhook Events

High Resource Usage

Triggered when CPU or memory exceeds limits
Includes full resource information
Action: highResourceUsage

Unexpected Server Shutdown

Detected on server restart
Lists applications that didn’t shutdown properly
Action: unexpectedServerShutdown

Webhook Payload Example

{
  "action": "highResourceUsage",
  "host": "stream-server-01.example.com",
  "resourceInfo": {
    "cpuUsage": {"systemCPULoad": 85},
    "jvmMemoryUsage": {"inUseMemory": 7516192768}
  }
}

Log-based Monitoring

System Resource Logs

Ant Media Server logs resource metrics every 5 minutes (src/main/java/io/antmedia/statistic/StatsCollector.java:360):

INFO  System CPU:45% Process CPU:32% System Load Average:2.5 Memory:60% 
      Vertx worker queue size:0 WebRTCVertx worker queue size:2
INFO  DB Average Query Time:5ms and Query Count:1234 for app:LiveApp

Log Levels

Configure logging in conf/red5.properties:

# Application log level (ALL, TRACE, DEBUG, INFO, WARN, ERROR, OFF)
logLevel=INFO

# Native log level for FFmpeg and WebRTC (ERROR recommended for production)
nativeLogLevel=ERROR

Best Practices

Set Appropriate Limits: Configure CPU and memory limits based on your server capacity
Enable Kafka: Stream metrics to Kafka for long-term storage and analysis
Configure Webhooks: Set up webhook notifications for critical events
Monitor Queue Sizes: Watch Vertx worker queue sizes for performance bottlenecks
Track GPU Usage: Monitor GPU utilization when using hardware encoding
Review Logs Regularly: Check logs for resource warnings and errors
Database Performance: Monitor database query times for performance issues
Client Statistics: Track WebRTC client stats to identify network issues

Health Check Endpoint

Implement health checks using the server time endpoint:

# Simple health check
curl -f http://localhost:5080/rest/v2/server-time || exit 1

# Health check with resource validation
curl -s http://localhost:5080/rest/v2/system-resources-info | \
  jq -e '.cpuUsage.systemCPULoad < 90 and .jvmMemoryUsage.inUseMemory < .jvmMemoryUsage.maxMemory'

Monitoring Dashboard Integration

Integrate with popular monitoring tools:

Prometheus

Export metrics from Kafka or REST API to Prometheus format.

Grafana

Create dashboards using:

Kafka data source
REST API queries via JSON API plugin
Custom exporters

ELK Stack

Stream Kafka topics to Elasticsearch:

Use Logstash Kafka input plugin
Create Kibana dashboards
Set up alerts based on thresholds

Get Started

Core Concepts

Deployment

Configuration

Streaming

Features

Security

Storage

SDKs

Operations

​System Resource Monitoring

​REST API Endpoints

​Key Metrics Available

​Monitoring Configuration

​Resource Limits

​CPU Limits

​Memory Limits

​Checking Resource Availability

​Kafka Integration

​Configure Kafka Brokers

​Available Kafka Topics

​Example Kafka Consumer

​GPU Monitoring

​GPU Metrics Available

​Check GPU Status

​WebRTC Client Statistics

​Client Metrics

​Access Client Stats

​Thread Monitoring

​Thread Metrics

​Vertx Worker Queue Monitoring

​Webhooks for Monitoring

​Webhook Events

​Webhook Payload Example

​Log-based Monitoring

​System Resource Logs

​Log Levels

​Best Practices

​Health Check Endpoint

​Monitoring Dashboard Integration

​Prometheus

​Grafana

​ELK Stack

Build docs developers (and LLMs) love