Overview
The ML Detector is the intelligence core of ML Defender, running 4 embedded RandomForest models in C++20 for sub-millisecond threat classification. Validated on the CTU-13 dataset with 97.6% accuracy, it provides real-time detection of DDoS attacks, ransomware, and network anomalies.4 Embedded Models
- DDoS Detection (97.6% accuracy)
- Ransomware Detection
- Traffic Classification
- Internal Anomaly Detection
Performance
- <5ms classification latency
- 10-50K events/sec throughput per quintuple
- 83 features extracted per flow
- Zero-copy IPC with Unix sockets
Three-Layer Architecture
The ML Detector implements a tricapa (three-tier) decision cascade that progressively analyzes threats with increasing precision:Level 1: General Attack Detection
Purpose: Fast binary classification (Attack vs. Benign)- Model: Random Forest (23 features)
- Latency: <1ms
- Features: High-level flow statistics (packet counts, byte ratios, flag patterns)
Level 2: Specialized Classification
Purpose: Identify attack type (DDoS, Ransomware, Unknown)- Model: Random Forest (82 features)
- Latency: <3ms
- Features: Deep protocol analysis, timing patterns, behavioral indicators
Level 3: Anomaly Detection
Purpose: Catch novel threats not seen in training- Model: Statistical anomaly detection (4 features)
- Latency: <0.5ms
- Features: Internal/Web traffic patterns
Design Philosophy: Early exit for benign traffic (Level 1 → Level 3 → END) minimizes CPU for normal operations while deep analysis (Level 2) activates only for threats.
Model Performance
CTU-13 Dataset Validation
Tested on CTU-13 Neris Botnet captures (real-world ransomware behavior):| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| DDoS Detection | 97.6% | 96.8% | 97.2% | 97.0% |
| Ransomware Detection | 95.4% | 94.1% | 96.3% | 95.2% |
| Traffic Classification | 93.8% | 92.5% | 94.0% | 93.2% |
| Anomaly Detection | 91.2% | 89.7% | 92.1% | 90.9% |
- DDoS Model
- Ransomware Model
Test Configuration:Confusion Matrix:
Feature Extraction
The ML Detector receives 83 features per network flow from the Sniffer component:Packet Statistics (20 features)
Packet Statistics (20 features)
total_forward_packets,total_backward_packetstotal_forward_bytes,total_backward_bytesforward_packet_length_mean,backward_packet_length_meanforward_packet_length_std,backward_packet_length_stdforward_packet_length_max,backward_packet_length_maxforward_packet_length_min,backward_packet_length_min
Flag Patterns (12 features)
Flag Patterns (12 features)
fin_flag_count,syn_flag_count,rst_flag_countpsh_flag_count,ack_flag_count,urg_flag_countcwe_flag_count,ece_flag_countforward_psh_flags,backward_psh_flagsforward_urg_flags,backward_urg_flags
Timing Analysis (18 features)
Timing Analysis (18 features)
flow_duration,flow_iat_mean,flow_iat_stdflow_iat_max,flow_iat_minforward_iat_total,forward_iat_mean,forward_iat_stdbackward_iat_total,backward_iat_mean,backward_iat_stdactive_mean,active_std,active_max,active_minidle_mean,idle_std,idle_max,idle_min
Ransomware Indicators (20 features)
Ransomware Indicators (20 features)
external_ips_30s: Unique external IPs contactedsmb_diversity: Lateral movement trackingdns_entropy: DGA domain detectionfailed_dns_ratio: C&C failuresupload_download_ratio: Exfiltration patternsburst_connections: Rapid connection attemptspayload_entropy: Encryption detection- … (13 more behavioral features)
Protocol-Specific (13 features)
Protocol-Specific (13 features)
avg_packet_size,packet_length_variancedown_up_ratio,avg_forward_segment_sizeavg_backward_segment_sizesubflow_forward_packets,subflow_backward_packetsinit_win_bytes_forward,init_win_bytes_backwardmin_seg_size_forward,min_seg_size_backward
ZeroMQ Integration
The ML Detector uses ZeroMQ PULL/PUB pattern for high-performance inter-process communication:- Architecture
- PULL Socket (Input)
- PUB Socket (Output)
Configuration
Model Metadata
Each RandomForest model includes metadata for feature mapping and thresholds:Runtime Configuration
Createconfig/ml_detector.json:
Deployment
Prerequisites
Build
Run
Integration with Pipeline
Quintuple Co-located Architecture
The ML Detector is part of a specialized 5-component pipeline:- Input: Sniffer
- Output: Firewall Agent
- Dual-NIC Support
Receives NetworkSecurityEvent protobuf messages with 83 features:
Model Training & Updates
Converting Scikit-learn to ONNX
Model Versioning
ML Defender supports hot-swapping models without downtime:- Place new model in
models/production/ - Update metadata in
models/metadata/ - Update config with new path
- Send SIGHUP to ml-detector process
Monitoring & Metrics
Real-time Statistics
Troubleshooting
Model Loading Fails
Model Loading Fails
High Latency (>5ms)
High Latency (>5ms)
Possible causes:
- Too many features: Reduce to essential features only
- Large model: Prune RandomForest trees (max_depth=10)
- CPU contention: Pin ml-detector to dedicated cores
ZMQ Connection Refused
ZMQ Connection Refused
Decryption Errors
Decryption Errors
Next Steps
Firewall Agent
Configure autonomous blocking based on ML detections
RAG System
Query detections using natural language
Model Training
Train custom models on your network data
Performance Tuning
Optimize for 10Gbps+ environments