Overview
ML Defender uses JSON configuration files as the single source of truth for all components. All settings are externalized—no hardcoded values.Via Appia Quality: Configuration files are the law. All runtime behavior is controlled via JSON, enabling deployment without recompilation.
Configuration File Locations
| Component | Config File | Description |
|---|---|---|
| Firewall Agent | firewall-acl-agent/config/firewall.json | IPSet, IPTables, decryption settings |
| ML Detector | ml-detector/config/ml_detector_config.json | Model paths, thresholds, inference settings |
| Sniffer | sniffer/config/sniffer.json | Interface, eBPF settings, feature extraction |
| etcd Server | etcd-server/config/etcd-server.json | Port, heartbeat, auto-restart |
| RAG Ingester | rag-ingester/config/rag-ingester.json | FAISS index, embeddings, log parsing |
Firewall ACL Agent Configuration
File:firewall-acl-agent/config/firewall.json
Component Metadata
Component identifier
Configuration schema version
Operational mode:
packet_filtering or logging_onlyTransport Layer
Enable LZ4 decompression of incoming ML detector events
Compression algorithm (only
lz4 supported)Enable ChaCha20-Poly1305 decryption
Encryption algorithm (AEAD cipher)
Require crypto token from etcd (fail if unavailable)
etcd Configuration
Enable etcd client for crypto key exchange
List of etcd server endpoints
etcd path where decryption keys are stored
Connection timeout in milliseconds
Heartbeat interval for service health monitoring
ZeroMQ Configuration
ZMQ endpoint to receive encrypted events from ml-detector
ZMQ PUB/SUB topic filter (empty = subscribe to all)
Receive timeout in milliseconds
Maximum messages queued before dropping
IPSet Configuration
IPSet name for blocked IPs
IPSet type:
hash:ip, hash:net, hash:ip,portMaximum IPs in IPSet. Production: Set to 5000-10000
IP entry timeout in seconds (0 = permanent)
Hash table size (power of 2 recommended)
Auto-create IPSet on startup if not exists
Clear IPSet on startup (use
true for testing)Example: Increase IPSet Capacity
IPTables Configuration
IPTables chain name for ML Defender rules
Default policy for chain:
ACCEPT or DROPLog blocked packets to syslog
Prefix for syslog entries
Position in chain (1 = first rule, highest priority)
Batch Processor
Batch size before committing to IPSet
Max time to wait before flushing batch (ms)
Minimum ML confidence score to block (0.0-1.0)
Logging
Log level:
debug, info, warn, errorLog file path (single source of truth)
Max log file size before rotation
Number of rotated log files to keep
Metrics
Enable JSON metrics export
Metrics export interval in seconds
Metrics output file path
Metrics Schema
ML Detector Configuration
File:ml-detector/config/ml_detector_config.json
Model Configuration
Base directory for ML models
Level 1: Attack Detector
Enable Level 1 general attack detection
ONNX model file path (relative to
models_base_dir)Number of input features
Detection threshold (0.0-1.0)
Level 2: DDoS Detection (Embedded C++20)
Enable embedded RandomForest DDoS detector
Number of input features
Model type (embedded C++20, no ONNX required)
DDoS detection threshold
syn_ack_ratio: SYN/ACK packet ratiopacket_symmetry: Bidirectional traffic balancesource_ip_dispersion: Unique source IPsprotocol_anomaly_score: Protocol distribution anomalypacket_size_entropy: Shannon entropy of packet sizestraffic_amplification_factor: Amplification attack indicatorflow_completion_rate: TCP handshake completion rategeographical_concentration: IP geolocation clusteringtraffic_escalation_rate: Rate of traffic increaseresource_saturation_score: System resource pressure
Level 2: Ransomware Detection (Embedded C++20)
Enable embedded RandomForest ransomware detector
Number of input features
Ransomware detection threshold
io_intensity: File I/O rateentropy: File entropy (encrypted data indicator)resource_usage: CPU/memory consumptionnetwork_activity: C2 communication patternsfile_operations: File modification frequencyprocess_anomaly: Unusual process behaviortemporal_pattern: Time-series anomaliesaccess_frequency: File access patternsdata_volume: Data transfer volumebehavior_consistency: Deviation from baseline
Level 3: Traffic Classification (Embedded C++20)
Enable traffic classification (Internet vs Internal)
Number of input features
Traffic classification threshold
Level 3: Internal Traffic Analysis (Embedded C++20)
Enable internal traffic anomaly detection
Number of input features
Internal traffic threshold
Transport Configuration
Enable LZ4 compression of outgoing events
LZ4 compression level (1-12, 1=fastest)
Enable ChaCha20-Poly1305 encryption
Automatic key rotation interval (hours)
ZeroMQ Output
ZMQ publisher endpoint (binds to this address)
Socket type:
PUB (publish-subscribe)RAG Logger
Enable RAG-compatible event logging
Base directory for RAG logs
Log only detection events (threshold exceeded)
Minimum score to log event
Sniffer Configuration
File:sniffer/config/sniffer.json
Deployment Mode
Deployment mode:
host-only: Single NIC, host protectiongateway-only: Single NIC, transit traffic inspectiondual: Dual NIC, simultaneous host + gatewayvalidation: PCAP replay testing
Network Interfaces
WAN-facing interface for host-based IDS
XDP mode:
native, offloaded, skb (generic)LAN-facing interface for gateway mode
Enable promiscuous mode (required for gateway)
eBPF Configuration
Compiled eBPF object file
XDP attach mode:
native, offloaded, skbeBPF ring buffer size (bytes)
Maximum concurrent flows tracked in kernel
Flow entry timeout (5 minutes)
Fast Detector (Heuristics)
Enable fast heuristic detection
Score for confirmed ransomware patterns
Trigger if >15 external IPs contacted in 30 seconds
Trigger if >10 unique SMB destinations (lateral movement)
Trigger if DNS query entropy >2.5 (DGA detection)
Trigger if upload/download ratio >3.0 (exfiltration)
ML Defender Thresholds
DDoS detection threshold (conservative for PCAP validation)
Ransomware detection threshold
Traffic classification threshold
Internal traffic anomaly threshold
etcd Server Configuration
File:etcd-server/config/etcd-server.json
etcd client API port
Bind address (0.0.0.0 = all interfaces)
Number of worker threads
Heartbeat Monitoring
Enable service heartbeat monitoring
Heartbeat check interval
Service considered dead after
interval * multiplier secondsAuto-Restart
Auto-restart failed services
Maximum restart attempts
Delay between restart attempts
Command to restart sniffer
RAG Ingester Configuration
File:rag-ingester/config/rag-ingester.json
Service Identity
Unique service identifier
etcd endpoints for service registration
Partner ml-detector service ID for crypto key sync
Input Configuration
Input source:
file_watch or zmq_streamDirectory to watch for new log files
File pattern to match (encrypted protobuf)
Files are encrypted (ChaCha20-Poly1305)
Files are compressed (LZ4)
Embedders
Enable Chronos temporal embeddings
ONNX model path
Input feature dimension
Embedding dimension
FAISS Configuration
FAISS index type:
Flat, IVF, HNSWDistance metric:
L2, IP (inner product)Directory for persistent FAISS indexes
Save index every N events
Example Configurations
Production Firewall Config
Production firewall.json
Gateway Mode Sniffer Config
Gateway sniffer.json
Conservative ML Thresholds
ml_detector_config.json
Configuration Best Practices
Single Source of Truth: Never hardcode values. All settings in JSON.
Test Before Production: Validate configs in lab with
dry_run: trueMonitor Metrics: Always enable metrics export for capacity planning
IPSet Capacity: Plan for 2x expected peak blocked IPs
Threshold Tuning: Start conservative (0.8+), lower based on false positive rate
Crypto Seed Exchange
ML Defender uses ChaCha20-Poly1305 authenticated encryption for all inter-component communication.How It Works
- ml-detector generates a random 256-bit key on startup
- Key is stored in etcd at
/crypto/ml-detector/tokens - firewall-acl-agent reads key from etcd at
/crypto/firewall/tokens - Both components use the same key for encryption/decryption
- Keys rotate every 24 hours (configurable)
Verify Crypto Sync
Manual Key Rotation
Next Steps
Quick Start
Launch ML Defender with default configs
Deployment
Production deployment strategies
Monitoring
Metrics, alerts, and observability
Troubleshooting
Common issues and solutions