Skip to main content

Overview

ML Defender uses JSON configuration files as the single source of truth for all components. All settings are externalized—no hardcoded values.
Via Appia Quality: Configuration files are the law. All runtime behavior is controlled via JSON, enabling deployment without recompilation.

Configuration File Locations

ComponentConfig FileDescription
Firewall Agentfirewall-acl-agent/config/firewall.jsonIPSet, IPTables, decryption settings
ML Detectorml-detector/config/ml_detector_config.jsonModel paths, thresholds, inference settings
Sniffersniffer/config/sniffer.jsonInterface, eBPF settings, feature extraction
etcd Serveretcd-server/config/etcd-server.jsonPort, heartbeat, auto-restart
RAG Ingesterrag-ingester/config/rag-ingester.jsonFAISS index, embeddings, log parsing

Firewall ACL Agent Configuration

File: firewall-acl-agent/config/firewall.json

Component Metadata

component.name
string
default:"firewall-acl-agent"
Component identifier
component.version
string
default:"1.2.1"
Configuration schema version
component.mode
string
default:"packet_filtering"
Operational mode: packet_filtering or logging_only

Transport Layer

transport.compression.enabled
boolean
default:"true"
Enable LZ4 decompression of incoming ML detector events
transport.compression.algorithm
string
default:"lz4"
Compression algorithm (only lz4 supported)
transport.encryption.enabled
boolean
default:"true"
Enable ChaCha20-Poly1305 decryption
transport.encryption.algorithm
string
default:"chacha20-poly1305"
Encryption algorithm (AEAD cipher)
transport.encryption.etcd_token_required
boolean
default:"true"
Require crypto token from etcd (fail if unavailable)

etcd Configuration

etcd.enabled
boolean
default:"true"
Enable etcd client for crypto key exchange
etcd.endpoints
array
default:"[\"localhost:2379\"]"
List of etcd server endpoints
etcd.crypto_token_path
string
default:"/crypto/firewall/tokens"
etcd path where decryption keys are stored
etcd.connection_timeout_ms
integer
default:"5000"
Connection timeout in milliseconds
etcd.heartbeat_interval_seconds
integer
default:"30"
Heartbeat interval for service health monitoring

ZeroMQ Configuration

zmq.endpoint
string
default:"tcp://localhost:5572"
ZMQ endpoint to receive encrypted events from ml-detector
zmq.topic
string
default:""
ZMQ PUB/SUB topic filter (empty = subscribe to all)
zmq.recv_timeout_ms
integer
default:"1000"
Receive timeout in milliseconds
zmq.high_water_mark
integer
default:"1000"
Maximum messages queued before dropping

IPSet Configuration

Critical: IPSet capacity planning is essential. Once full, new IPs are queued but not blocked.
ipsets.blacklist.set_name
string
default:"ml_defender_blacklist_test"
IPSet name for blocked IPs
ipsets.blacklist.set_type
string
default:"hash:ip"
IPSet type: hash:ip, hash:net, hash:ip,port
ipsets.blacklist.max_elements
integer
default:"1000"
Maximum IPs in IPSet. Production: Set to 5000-10000
ipsets.blacklist.timeout
integer
default:"3600"
IP entry timeout in seconds (0 = permanent)
ipsets.blacklist.hash_size
integer
default:"1024"
Hash table size (power of 2 recommended)
ipsets.blacklist.create_if_missing
boolean
default:"true"
Auto-create IPSet on startup if not exists
ipsets.blacklist.flush_on_startup
boolean
default:"false"
Clear IPSet on startup (use true for testing)

Example: Increase IPSet Capacity

"ipsets": {
  "blacklist": {
    "set_name": "ml_defender_blacklist_prod",
    "max_elements": 10000,
    "timeout": 7200,
    "hash_size": 4096
  }
}

IPTables Configuration

iptables.chain_name
string
default:"ML_DEFENDER_TEST"
IPTables chain name for ML Defender rules
iptables.default_policy
string
default:"ACCEPT"
Default policy for chain: ACCEPT or DROP
iptables.log_blocked
boolean
default:"true"
Log blocked packets to syslog
iptables.log_prefix
string
default:"ML_DEFENDER_TEST_DROP: "
Prefix for syslog entries
iptables.insert_rule_position
integer
default:"1"
Position in chain (1 = first rule, highest priority)

Batch Processor

batch_processor.batch_size_threshold
integer
default:"10"
Batch size before committing to IPSet
batch_processor.batch_time_threshold_ms
integer
default:"1000"
Max time to wait before flushing batch (ms)
batch_processor.min_confidence
float
default:"0.5"
Minimum ML confidence score to block (0.0-1.0)

Logging

logging.level
string
default:"debug"
Log level: debug, info, warn, error
logging.file
string
default:"/vagrant/logs/lab/firewall-agent.log"
Log file path (single source of truth)
logging.max_file_size_mb
integer
default:"10"
Max log file size before rotation
logging.backup_count
integer
default:"5"
Number of rotated log files to keep

Metrics

metrics.enable_export
boolean
default:"true"
Enable JSON metrics export
metrics.export_interval_sec
integer
default:"30"
Metrics export interval in seconds
metrics.export_file
string
default:"/vagrant/logs/lab/firewall-metrics.json"
Metrics output file path

Metrics Schema

{
  "timestamp": "2026-02-08T15:30:00Z",
  "zmq_messages_received": 36000,
  "crypto_errors": 0,
  "decompression_errors": 0,
  "protobuf_parse_errors": 0,
  "ipset_successes": 118,
  "ipset_failures": 16681,
  "max_queue_depth": 16690,
  "avg_processing_time_us": 450,
  "cpu_percent": 54.2,
  "memory_rss_mb": 127
}

ML Detector Configuration

File: ml-detector/config/ml_detector_config.json

Model Configuration

ml.models_base_dir
string
default:"models/production"
Base directory for ML models

Level 1: Attack Detector

ml.level1.enabled
boolean
default:"true"
Enable Level 1 general attack detection
ml.level1.model_file
string
default:"level1/level1_attack_detector.onnx"
ONNX model file path (relative to models_base_dir)
ml.level1.features_count
integer
default:"23"
Number of input features
ml.thresholds.level1_attack
float
default:"0.65"
Detection threshold (0.0-1.0)

Level 2: DDoS Detection (Embedded C++20)

ml.level2.ddos.enabled
boolean
default:"true"
Enable embedded RandomForest DDoS detector
ml.level2.ddos.features_count
integer
default:"10"
Number of input features
ml.level2.ddos.model_type
string
default:"RandomForest-Embedded"
Model type (embedded C++20, no ONNX required)
ml.thresholds.level2_ddos
float
default:"0.7"
DDoS detection threshold
Features:
  • syn_ack_ratio: SYN/ACK packet ratio
  • packet_symmetry: Bidirectional traffic balance
  • source_ip_dispersion: Unique source IPs
  • protocol_anomaly_score: Protocol distribution anomaly
  • packet_size_entropy: Shannon entropy of packet sizes
  • traffic_amplification_factor: Amplification attack indicator
  • flow_completion_rate: TCP handshake completion rate
  • geographical_concentration: IP geolocation clustering
  • traffic_escalation_rate: Rate of traffic increase
  • resource_saturation_score: System resource pressure

Level 2: Ransomware Detection (Embedded C++20)

ml.level2.ransomware.enabled
boolean
default:"true"
Enable embedded RandomForest ransomware detector
ml.level2.ransomware.features_count
integer
default:"10"
Number of input features
ml.thresholds.level2_ransomware
float
default:"0.75"
Ransomware detection threshold
Features:
  • io_intensity: File I/O rate
  • entropy: File entropy (encrypted data indicator)
  • resource_usage: CPU/memory consumption
  • network_activity: C2 communication patterns
  • file_operations: File modification frequency
  • process_anomaly: Unusual process behavior
  • temporal_pattern: Time-series anomalies
  • access_frequency: File access patterns
  • data_volume: Data transfer volume
  • behavior_consistency: Deviation from baseline

Level 3: Traffic Classification (Embedded C++20)

ml.level3.web.enabled
boolean
default:"true"
Enable traffic classification (Internet vs Internal)
ml.level3.web.features_count
integer
default:"10"
Number of input features
ml.thresholds.level3_web
float
default:"0.6"
Traffic classification threshold

Level 3: Internal Traffic Analysis (Embedded C++20)

ml.level3.internal.enabled
boolean
default:"true"
Enable internal traffic anomaly detection
ml.level3.internal.features_count
integer
default:"10"
Number of input features
ml.thresholds.level3_internal
float
default:"0.65"
Internal traffic threshold
Use Cases: Lateral movement, data exfiltration, service discovery

Transport Configuration

transport.compression.enabled
boolean
default:"true"
Enable LZ4 compression of outgoing events
transport.compression.level
integer
default:"1"
LZ4 compression level (1-12, 1=fastest)
transport.encryption.enabled
boolean
default:"true"
Enable ChaCha20-Poly1305 encryption
transport.encryption.key_rotation_hours
integer
default:"24"
Automatic key rotation interval (hours)

ZeroMQ Output

network.output_socket.endpoint
string
default:"tcp://0.0.0.0:5572"
ZMQ publisher endpoint (binds to this address)
network.output_socket.socket_type
string
default:"PUB"
Socket type: PUB (publish-subscribe)

RAG Logger

rag_logger.enabled
boolean
default:"true"
Enable RAG-compatible event logging
rag_logger.base_dir
string
default:"/vagrant/logs/rag"
Base directory for RAG logs
rag_logger.log_detections_only
boolean
default:"true"
Log only detection events (threshold exceeded)
rag_logger.min_score_threshold
float
default:"0.5"
Minimum score to log event

Sniffer Configuration

File: sniffer/config/sniffer.json

Deployment Mode

deployment.mode
string
default:"dual"
Deployment mode:
  • host-only: Single NIC, host protection
  • gateway-only: Single NIC, transit traffic inspection
  • dual: Dual NIC, simultaneous host + gateway
  • validation: PCAP replay testing

Network Interfaces

deployment.host_interface.name
string
default:"eth1"
WAN-facing interface for host-based IDS
deployment.host_interface.xdp_mode
string
default:"native"
XDP mode: native, offloaded, skb (generic)
deployment.gateway_interface.name
string
default:"eth2"
LAN-facing interface for gateway mode
deployment.gateway_interface.promiscuous
boolean
default:"true"
Enable promiscuous mode (required for gateway)

eBPF Configuration

kernel_space.ebpf_program
string
default:"sniffer.bpf.o"
Compiled eBPF object file
kernel_space.xdp_mode
string
default:"skb"
XDP attach mode: native, offloaded, skb
kernel_space.ring_buffer_size
integer
default:"1048576"
eBPF ring buffer size (bytes)
kernel_space.max_flows_in_kernel
integer
default:"500000"
Maximum concurrent flows tracked in kernel
kernel_space.flow_timeout_seconds
integer
default:"300"
Flow entry timeout (5 minutes)

Fast Detector (Heuristics)

Day 12 Enhancement: Fast path detection runs at <1μs for obvious threats before ML inference.
fast_detector.enabled
boolean
default:"true"
Enable fast heuristic detection
fast_detector.ransomware.scores.high_threat
float
default:"0.95"
Score for confirmed ransomware patterns
fast_detector.ransomware.activation_thresholds.external_ips_30s
integer
default:"15"
Trigger if >15 external IPs contacted in 30 seconds
fast_detector.ransomware.activation_thresholds.smb_diversity
integer
default:"10"
Trigger if >10 unique SMB destinations (lateral movement)
fast_detector.ransomware.activation_thresholds.dns_entropy
float
default:"2.5"
Trigger if DNS query entropy >2.5 (DGA detection)
fast_detector.ransomware.activation_thresholds.upload_download_ratio
float
default:"3.0"
Trigger if upload/download ratio >3.0 (exfiltration)

ML Defender Thresholds

ml_defender.thresholds.ddos
float
default:"0.85"
DDoS detection threshold (conservative for PCAP validation)
ml_defender.thresholds.ransomware
float
default:"0.90"
Ransomware detection threshold
ml_defender.thresholds.traffic
float
default:"0.80"
Traffic classification threshold
ml_defender.thresholds.internal
float
default:"0.85"
Internal traffic anomaly threshold

etcd Server Configuration

File: etcd-server/config/etcd-server.json
server.port
integer
default:"2379"
etcd client API port
server.host
string
default:"0.0.0.0"
Bind address (0.0.0.0 = all interfaces)
server.worker_threads
integer
default:"4"
Number of worker threads

Heartbeat Monitoring

heartbeat.enabled
boolean
default:"true"
Enable service heartbeat monitoring
heartbeat.interval_seconds
integer
default:"30"
Heartbeat check interval
heartbeat.timeout_multiplier
integer
default:"3"
Service considered dead after interval * multiplier seconds

Auto-Restart

auto_restart.enabled
boolean
default:"true"
Auto-restart failed services
auto_restart.max_attempts
integer
default:"3"
Maximum restart attempts
auto_restart.delay_seconds
integer
default:"5"
Delay between restart attempts
auto_restart.commands.sniffer
string
Command to restart sniffer
cd /vagrant/sniffer/build && sudo ./sniffer -c ../config/sniffer.json

RAG Ingester Configuration

File: rag-ingester/config/rag-ingester.json

Service Identity

service.id
string
default:"rag-ingester-default"
Unique service identifier
service.etcd.endpoints
array
default:"[\"127.0.0.1:2379\"]"
etcd endpoints for service registration
service.etcd.partner_detector
string
default:"ml-detector-default"
Partner ml-detector service ID for crypto key sync

Input Configuration

ingester.input.source
string
default:"file_watch"
Input source: file_watch or zmq_stream
ingester.input.directory
string
Directory to watch for new log files
ingester.input.pattern
string
default:"*.pb.enc"
File pattern to match (encrypted protobuf)
ingester.input.encrypted
boolean
default:"true"
Files are encrypted (ChaCha20-Poly1305)
ingester.input.compressed
boolean
default:"true"
Files are compressed (LZ4)

Embedders

ingester.embedders.chronos.enabled
boolean
default:"true"
Enable Chronos temporal embeddings
ingester.embedders.chronos.onnx_path
string
default:"/vagrant/rag-ingester/models/onnx/chronos.onnx"
ONNX model path
ingester.embedders.chronos.input_dim
integer
default:"83"
Input feature dimension
ingester.embedders.chronos.output_dim
integer
default:"512"
Embedding dimension

FAISS Configuration

ingester.faiss.index_type
string
default:"Flat"
FAISS index type: Flat, IVF, HNSW
ingester.faiss.metric
string
default:"L2"
Distance metric: L2, IP (inner product)
ingester.faiss.persist_path
string
default:"/shared/faiss_indexes"
Directory for persistent FAISS indexes
ingester.faiss.checkpoint_interval_events
integer
default:"1000"
Save index every N events

Example Configurations

Production Firewall Config

Production firewall.json
{
  "ipsets": {
    "blacklist": {
      "set_name": "ml_defender_blacklist_prod",
      "max_elements": 10000,
      "timeout": 7200,
      "hash_size": 4096
    }
  },
  "batch_processor": {
    "batch_size_threshold": 50,
    "min_confidence": 0.7
  },
  "logging": {
    "level": "info",
    "file": "/var/log/ml-defender/firewall.log"
  },
  "metrics": {
    "export_interval_sec": 60
  }
}

Gateway Mode Sniffer Config

Gateway sniffer.json
{
  "deployment": {
    "mode": "dual",
    "host_interface": {
      "name": "eth0",
      "xdp_mode": "native"
    },
    "gateway_interface": {
      "name": "eth1",
      "xdp_mode": "native",
      "promiscuous": true
    },
    "network_settings": {
      "enable_ip_forwarding": true,
      "enable_nat": true
    }
  },
  "profile": "dual_nic"
}

Conservative ML Thresholds

ml_detector_config.json
{
  "ml": {
    "thresholds": {
      "level1_attack": 0.8,
      "level2_ddos": 0.85,
      "level2_ransomware": 0.9,
      "level3_web": 0.75,
      "level3_internal": 0.85
    }
  }
}

Configuration Best Practices

Single Source of Truth: Never hardcode values. All settings in JSON.
Test Before Production: Validate configs in lab with dry_run: true
Monitor Metrics: Always enable metrics export for capacity planning
IPSet Capacity: Plan for 2x expected peak blocked IPs
Threshold Tuning: Start conservative (0.8+), lower based on false positive rate

Crypto Seed Exchange

ML Defender uses ChaCha20-Poly1305 authenticated encryption for all inter-component communication.

How It Works

  1. ml-detector generates a random 256-bit key on startup
  2. Key is stored in etcd at /crypto/ml-detector/tokens
  3. firewall-acl-agent reads key from etcd at /crypto/firewall/tokens
  4. Both components use the same key for encryption/decryption
  5. Keys rotate every 24 hours (configurable)

Verify Crypto Sync

# Check etcd keys
etcdctl get --prefix /crypto

# Should show:
# /crypto/ml-detector/tokens: <base64-encoded-key>
# /crypto/firewall/tokens: <base64-encoded-key>

Manual Key Rotation

# Restart ml-detector to generate new key
sudo pkill ml-detector
cd ml-detector/build && ./ml-detector -c ../config/ml_detector_config.json

# Restart firewall-agent to sync new key
sudo pkill firewall-acl-agent
cd firewall-acl-agent/build && sudo ./firewall-acl-agent -c ../config/firewall.json

Next Steps

Quick Start

Launch ML Defender with default configs

Deployment

Production deployment strategies

Monitoring

Metrics, alerts, and observability

Troubleshooting

Common issues and solutions

Build docs developers (and LLMs) love