Skip to main content

Stress Testing Overview

ML Defender has been validated with 36,000 events across progressive stress tests, demonstrating production-grade stability and graceful degradation under extreme load.

Events Processed

36,000 total events

Peak Throughput

364.9 events/second

Errors

0 crypto/decompression errors

Test Design

Progressive Load Tests (Day 52)

From source/README.md:179-203: ML Defender underwent 4 progressive stress tests on Day 52:
TestEventsRateCPUDurationResult
11,00042.6/secN/A~23 sec✅ PASS
25,00094.9/secN/A~53 sec✅ PASS
310,000176.1/sec41-45%~57 sec✅ PASS
420,000364.9/sec49-54%~55 sec✅ PASS
Total Results (36K events):
crypto_errors: 0              ← Perfect ChaCha20-Poly1305 pipeline
decompression_errors: 0       ← Perfect LZ4 pipeline
protobuf_parse_errors: 0      ← Perfect message parsing
ipset_successes: 118          ← First ~1000 IPs blocked
ipset_failures: 16,681        ← Capacity limit (not a bug)
max_queue_depth: 16,690       ← Backpressure handled gracefully

Key Discoveries

Production-Ready Components:
  • ✅ Crypto pipeline: 0 errors at 36K events
  • ✅ CPU efficiency: 54% max under extreme load
  • ✅ Memory stable: 127 MB RSS
  • ✅ Graceful degradation: No crashes when capacity exceeded
Capacity Planning Insights:
  • IPSet capacity finite (max realistic: 500K IPs)
  • After ~1000 IPs, insertions fail (expected behavior)
  • System exhibits backpressure without crashes

8-Hour Stability Test

Test Configuration

From source/stress_test_8h.sh:
# Test parameters
TEST_DURATION_MINUTES=480  # 8 hours
TRAFFIC_RATE_PPS=75        # 75 packets/second target
MONITORING_INTERVAL=60     # Monitor every 60s

# Components tested
- sniffer (eBPF/XDP packet capture)
- ml-detector (RandomForest inference)
- Synthetic traffic generator
- Resource monitor

Test Phases

From source/stress_test_8h.sh:575-613 (traffic_generator_full.sh):
1

Phase 1: Warm-up (30 min)

  • Low load, gradual increase
  • HTTP/HTTPS requests (5/interval)
  • DNS queries (10/interval)
  • ICMP ping (3/interval)
2

Phase 2: Normal Load (2 hours)

  • Mixed protocols
  • HTTP/HTTPS: 10 requests/interval
  • DNS queries: 15/interval
  • Ping traffic: 5/interval
  • Simulates typical network behavior
3

Phase 3: Stress Testing (1.5 hours)

  • High sustained load (3-min cycles)
  • HTTP/HTTPS: 20 requests/interval
  • DNS: 30/interval
  • Stress bursts: 50 concurrent requests/minute
  • High-entropy traffic: 10/interval
4

Phase 4: Ransomware Simulation (1 hour)

  • Fake C2 connections: 10/interval
  • SMB lateral movement: 15/interval
  • Encrypted payloads: 20/interval
  • Tests detection accuracy under attack
5

Phase 5: Sustained Load (3 hours)

  • Continuous moderate traffic
  • HTTP/HTTPS: 12/interval
  • DNS: 20/interval
  • Tests long-term stability
6

Phase 6: Cool Down (30 min)

  • Gradual traffic reduction
  • Verify clean shutdown

Running Stress Tests

Quick Stress Test (10 min)

# Start all components
make run-lab-dev

# In another terminal, run stress test
cd tools/build
./synthetic_ml_output_injector 10000 200
# 10,000 events at 200 events/sec

# Monitor firewall logs
tail -f /vagrant/logs/lab/firewall-agent.log | grep "events_processed"

Full 8-Hour Test

# Run full 8-hour stress test
./stress_test_8h.sh

# Test artifacts saved to:
/vagrant/stress_test_<timestamp>/
  ├── logs/
   ├── sniffer.log
   ├── ml_detector.log
   ├── traffic.log
   └── monitor.log
  ├── monitoring/
   ├── cpu_usage.csv
   ├── memory_usage.csv
   └── network_stats.csv
  └── REPORT.md

Load Profiles

Hospital Benchmark

From source/scripts/day11_hospital_benchmark/: ML Defender includes realistic hospital traffic profiles:
Electronic Health Records
Simulates typical EHR system traffic.
scripts/day11_hospital_benchmark/traffic_profiles/ehr_load.sh
Characteristics:
  • HTTP/HTTPS requests to EHR API
  • Database queries (simulated)
  • Average: 50 requests/sec
  • Peak: 120 requests/sec

Custom Traffic Mix

From source/stress_test_8h.sh:616-629:
# Traffic distribution
Protocol Distribution:
  HTTP/HTTPS:     40%
  DNS:            30%
  ICMP:           15%
  SMB (TCP 445):  10%
  Other:          5%

Payload Types:
  Normal text:    25%
  Encrypted:      50%
  Random:         20%
  PE executable:  5%

Metrics Collection

Monitoring Script

From source/stress_test_monitor.sh:
#!/bin/bash
# Resource monitoring for stress tests

SNIFFER_PID=$1
ML_DETECTOR_PID=$2
OUTPUT_DIR=$3
INTERVAL=$4  # seconds

mkdir -p "$OUTPUT_DIR"

echo "timestamp,cpu_sniffer,cpu_detector,mem_sniffer_mb,mem_detector_mb" > "$OUTPUT_DIR/resources.csv"

while true; do
  TIMESTAMP=$(date +%s)
  
  # CPU usage
  CPU_SNIFFER=$(ps -p $SNIFFER_PID -o %cpu= | tr -d ' ')
  CPU_DETECTOR=$(ps -p $ML_DETECTOR_PID -o %cpu= | tr -d ' ')
  
  # Memory usage (RSS in MB)
  MEM_SNIFFER=$(ps -p $SNIFFER_PID -o rss= | awk '{print $1/1024}')
  MEM_DETECTOR=$(ps -p $ML_DETECTOR_PID -o rss= | awk '{print $1/1024}')
  
  echo "$TIMESTAMP,$CPU_SNIFFER,$CPU_DETECTOR,$MEM_SNIFFER,$MEM_DETECTOR" >> "$OUTPUT_DIR/resources.csv"
  
  # Network stats
  ifconfig eth0 | grep -E 'RX packets|TX packets' > "$OUTPUT_DIR/network_$TIMESTAMP.txt"
  
  sleep $INTERVAL
done

Collected Metrics

  • CPU usage per component (%)
  • Memory usage (RSS MB)
  • Network throughput (packets/sec, bytes/sec)
  • Disk I/O (reads/writes)
  • Context switches
  • Events processed (total count)
  • Processing rate (events/sec)
  • Queue depth (current backlog)
  • Error counts (crypto, decompression, parse)
  • Latency (p50, p95, p99)
  • IPSet insertions (successes/failures)
  • IPs blocked (total count)
  • Iptables rule evaluations
  • Batch processing stats

Performance Benchmarking

17-Hour Sniffer Stability Test

From source/TESTING.md:164-182:
╔═══════════════════════════════════════════════════════════════╗
║  17-HOUR STABILITY TEST - FINAL RESULTS                       ║
╚═══════════════════════════════════════════════════════════════╝

Total Runtime:              17h 2m 10s (61,343 seconds)
Total Packets Processed:    2,080,549
Payloads Analyzed:          1,550,375 (74.5%)
Peak Throughput:            82.35 events/second
Average Throughput:         33.92 events/second
Memory Footprint:           4.5 MB (STABLE)
CPU Usage (load):           5-10%
CPU Usage (idle):           0%
Kernel Panics:              0
Segmentation Faults:        0
Memory Leaks:               0
Process Restarts:           0

Status: ✅ PRODUCTION-READY

Component Latency Breakdown

From source/TESTING.md:282-296:
ComponentLatencyCumulative
eBPF capture<1 μs1 μs
Ring buffer<1 μs2 μs
PayloadAnalyzer (fast)1 μs3 μs
FastDetector<1 μs4 μs
Protobuf serialize~10 μs14 μs
ZMQ PUSH~50 μs64 μs
End-to-end latency: ~64 μs (normal path)

Capacity Planning

Resource Recommendations

Based on 36K event stress testing and 17-hour stability validation:
Small Deployment (Home/SMB):
  • CPU: 2 cores @ 2.5 GHz
  • RAM: 4 GB
  • Network: 100 Mbps
  • Capacity: 50-100 events/sec
Medium Deployment (Enterprise):
  • CPU: 4 cores @ 3.0 GHz
  • RAM: 8 GB
  • Network: 1 Gbps
  • Capacity: 200-500 events/sec
Large Deployment (Hospital/ISP):
  • CPU: 8 cores @ 3.5 GHz
  • RAM: 16 GB
  • Network: 10 Gbps
  • Capacity: 1000+ events/sec

IPSet Capacity Planning

From source/README.md:196-203:
IPSet Limits:
  Default max:     65,536 IPs
  Realistic max:   500,000 IPs
  Tested capacity: 1,000 IPs (before failures)

Recommendations:
  - Implement multi-tier storage (IPSet → SQLite → Parquet)
  - Auto-eviction policy (LRU, time-based)
  - Capacity monitoring and alerts

Stress Test Analysis

Automated Report Generation

From source/stress_test_8h.sh:138-200:
# Generate test report
generate_report() {
    REPORT="${TEST_DIR}/REPORT.md"
    
    cat > "${REPORT}" <<EOF
# ML Defender - 8 Hour Stress Test Report

## Test Summary
- Start: $(cat ${TEST_DIR}/test_info.txt | grep "Start time" | cut -d: -f2-)
- Duration: ${HOURS}h ${MINUTES}m ${SECONDS}s
- Components: Sniffer + ML-Detector

## Results

### Event Processing
\`\`\`
Total events: $(grep "events_processed" ${LOGS_DIR}/ml_detector.log | tail -1 | awk '{print $NF}')
Average rate: $(echo "scale=2; $(grep "events_processed" ${LOGS_DIR}/ml_detector.log | tail -1 | awk '{print $NF}') / ${ACTUAL_RUNTIME}" | bc) events/sec
Peak rate: $(grep "events/sec" ${LOGS_DIR}/ml_detector.log | awk '{print $NF}' | sort -n | tail -1) events/sec
\`\`\`

### Resource Usage
\`\`\`
Peak CPU: $(awk -F',' '{print $2}' ${MONITORING_DIR}/resources.csv | sort -n | tail -1)%
Peak Memory: $(awk -F',' '{print $4}' ${MONITORING_DIR}/resources.csv | sort -n | tail -1) MB
Average CPU: $(awk -F',' '{sum+=$2; count++} END {print sum/count}' ${MONITORING_DIR}/resources.csv)%
Average Memory: $(awk -F',' '{sum+=$4; count++} END {print sum/count}' ${MONITORING_DIR}/resources.csv) MB
\`\`\`

### Errors
\`\`\`
Crashes: 0
Segfaults: 0
Memory leaks: 0
Crypto errors: $(grep "crypto_error" ${LOGS_DIR}/ml_detector.log | wc -l)
\`\`\`

## Conclusion
✅ Test PASSED - System stable under 8-hour stress test
EOF

    echo "✅ Report generated: ${REPORT}"
}

Next Steps

Testing Guide

Run unit and integration tests

Performance Tuning

Optimize system performance

Monitoring

Set up production monitoring

Troubleshooting

Debug performance issues

Build docs developers (and LLMs) love