Overview
ML Defender is an enterprise-grade distributed network security system built with a microservices architecture. Each component is designed for autonomy, resilience, and composability.All components are production-validated with real metrics from stress testing.
Component Architecture
Core Components
1. Sniffer (eBPF/XDP)
Purpose: High-performance packet capture and feature extraction Technology Stack:- eBPF/XDP for kernel-space filtering
- Ring buffer (4MB) for zero-copy data transfer
- Multi-threaded consumer pool
- libbpf CO-RE for portability
- 512-byte payload capture - First 512 bytes of L4 payload
- 83+ ML features - Comprehensive network behavior analysis
- Three-layer detection pipeline:
- Layer 0: eBPF/XDP payload extraction
- Layer 1.5: Payload analysis (entropy, PE headers, patterns)
- Layer 1: Fast heuristics (10s sliding window)
- Layer 2: Deep feature extraction (30s aggregation)
- Shannon entropy analysis (>7.0 bits = encrypted)
- PE executable detection (MZ/PE headers)
- 30+ ransomware signatures (.onion, crypto APIs, ransom notes)
- External IP tracking (C&C communication)
- SMB lateral movement detection
- Port scanning patterns
Sniffer Configuration Example
Sniffer Configuration Example
2. ML Detector
Purpose: Real-time threat classification using embedded ML models Technology Stack:- C++20 for performance
- ONNX Runtime for inference
- 4 embedded RandomForest models
- ZeroMQ PULL/PUB pattern
- DDoS Detection - 97.6% accuracy on CTU-13 dataset
- Ransomware Detection - Behavioral pattern classification
- Traffic Classification - Protocol and flow analysis
- Anomaly Detection - Internal vs external threats
- Detection latency: <1 μs (sub-microsecond)
- Throughput: 1M+ packets/sec (synthetic traffic)
- Features extracted: 83 per flow
- Models: 4 concurrent evaluations
ML Detector Configuration Example
ML Detector Configuration Example
3. Crypto Pipeline
Purpose: Secure, compressed transmission of threat data Technology Stack:- ChaCha20-Poly1305 (AEAD encryption)
- LZ4 fast compression
- libsodium crypto primitives
- ✅ Authenticated encryption (AEAD)
- ✅ Perfect forward secrecy
- ✅ No cleartext transmission of threats
- ✅ 0 errors @ 36K events (production-validated)
- ML Detector → Protobuf serialization
- LZ4 compression (typical 50-70% reduction)
- ChaCha20-Poly1305 encryption + authentication tag
- ZeroMQ transmission
- Firewall Agent → Decryption + verification
- LZ4 decompression
- Protobuf deserialization
4. etcd Server
Purpose: Distributed coordination and configuration management Technology Stack:- C++ implementation with etcd v3 API
- Key-value store with watch support
- Service discovery protocol
- ✅ Service registration & discovery
- ✅ Automatic crypto seed exchange
- ✅ Distributed configuration (JSON)
- ✅ Heartbeat mechanism (30s interval)
- ✅ Config versioning (master + active copies)
5. Firewall ACL Agent
Purpose: Autonomous network threat blocking Technology Stack:- C++20 with IPSet/IPTables integration
- ZeroMQ SUB pattern for threat consumption
- ChaCha20-Poly1305 decryption
- LZ4 decompression
- ⚡ Kernel-level blocking (IPSet)
- 🕒 Temporal rules (auto-expire after 1h)
- 🚦 Rate limiting per IP
- ✅ Whitelist/blacklist support
- 🔄 Graceful rollback on exit
- 📊 Metrics export
| Test | Events | Rate | CPU | Result |
|---|---|---|---|---|
| 1 | 1,000 | 42.6/sec | N/A | ✅ PASS |
| 2 | 5,000 | 94.9/sec | N/A | ✅ PASS |
| 3 | 10,000 | 176.1/sec | 41-45% | ✅ PASS |
| 4 | 20,000 | 364.9/sec | 49-54% | ✅ PASS |
- All parameters from JSON (zero hardcoding)
- IPSet names from config (no singleton ambiguity)
- Logging paths from config
- Batch processor tuning via JSON
Firewall Agent Configuration Example
Firewall Agent Configuration Example
6. RAG Ingester
Purpose: Log parsing and vector embedding generation Technology Stack:- Python with ONNX Runtime
- FAISS vector indexing
- Multi-threaded embedding pipeline
- crypto-transport library for decryption
- Chronos Index (128-d temporal) - Time series queries
- SBERT Index (96-d semantic) - Behavioral pattern queries
- Entity Benign Index (64-d, 10% sampling) - Benign entity queries
- Entity Malicious Index (64-d, 100% coverage) - Malicious entity queries
- Best-effort commits (indices commit independently)
- Availability > Consistency (better 3/4 indices than 0/4)
- Health tracking with circuit breakers
- Single-threaded (Raspberry Pi safe): 1 embedding + 1 indexing worker
- Multi-threaded (server): 3 embedding + 4 indexing workers
- Minimal memory: ~310MB (Raspberry Pi compatible)
- Scales to 64-core servers with multi-threading
7. RAG System
Purpose: Natural language forensic queries over threat data Technology Stack:- TinyLlama for language understanding
- FAISS for vector similarity search
- ZeroMQ for IPC
- etcd for distributed coordination
- 📋 Command whitelist (security control)
- 🤖 LLM integration (llama.cpp)
- 🔄 etcd client for config
- 🔐 Security context and audit logging
- 🎯 Command validator
- Whitelist-only command execution
- Regex pattern validation
- Restricted key access (no root/admin/password)
- Full audit trail of decisions
Communication Patterns
ZeroMQ Topology
Pattern: Publisher-Subscriber with PUSH-PULL for backpressure- Sniffer → ML Detector:
tcp://127.0.0.1:5571 - ML Detector → Firewall Agent:
tcp://127.0.0.1:5572 - ML Detector → RAG Ingester: File-based (.pb files)
Protobuf Serialization
Why Protobuf?- Compact binary format (50-70% smaller than JSON)
- Schema evolution (backwards compatible)
- Fast serialization/deserialization
- Cross-language support (C++, Python)
- Sniffer extracts features → Protobuf struct
- ML Detector adds predictions → Same Protobuf
- Crypto pipeline encrypts → Binary blob
- Firewall Agent decrypts → Protobuf struct
- RAG Ingester parses → Vector embeddings
Deployment Topology
Single-Node Deployment
Use case: Development, small networks (<100 hosts)- RAM: 8GB minimum
- CPU: 6 cores recommended
- Disk: 50GB (20GB logs + 30GB models/indices)
- Network: 1Gbps NIC
Dual-NIC Gateway Deployment
Use case: Network gateway, protecting entire LAN- Host-based mode (eth1): Packets TO gateway
- Gateway mode (eth3): Packets THROUGH gateway
- IP forwarding enabled
- eBPF programs on both interfaces
- Client (192.168.100.50) → eth3 (gateway mode)
- XDP captures transit traffic
- ML Detector classifies
- Firewall Agent blocks if malicious
- Legitimate traffic forwarded to Internet via eth1
Multi-Node Distributed Deployment
Use case: Large networks, high availability- Horizontal scaling (add more sniffers)
- Load balancing across ML detectors
- etcd cluster for HA coordination
- Shared FAISS indices (NFS/Ceph)
- Prometheus + Grafana monitoring
Next Steps
Component Deep Dives
Detailed documentation for each component
Deployment Guide
Step-by-step deployment instructions
Configuration Reference
Complete configuration options for all components
Performance Tuning
Optimization tips and benchmarking