ML Defender - ML Defender

Mission

Democratize enterprise-grade cybersecurity for hospitals, schools, and small organizations that cannot afford commercial solutions. Built to last decades with scientific honesty and methodical development. Philosophy: Via Appia Quality – Systems built like Roman roads, designed to endure.

Key Features

eBPF/XDP Packet Capture

High-performance kernel-space packet filtering with 512-byte payload capture. Zero-copy design with ring buffer delivery to userspace. Captures 2M+ packets over 17 hours with zero crashes.

Multi-Layer ML Detection

4 embedded RandomForest models with 97.6% accuracy on real malware (CTU-13 Neris botnet). Sub-microsecond detection latency with 83-feature extraction pipeline.

Autonomous Blocking

Kernel-level blocking via IPSet/IPTables with sub-10ms response time. Tested at 364 events/sec with graceful degradation. Config-driven architecture with zero hardcoding.

Encrypted Pipeline

ChaCha20-Poly1305 authenticated encryption with LZ4 compression. Production-validated with 36,000 events and zero crypto errors.

RAG Intelligence

Natural language forensic queries powered by TinyLlama and FAISS vector search. Multi-index strategy with eventual consistency for high availability.

Distributed Coordination

etcd-based service discovery with automatic crypto seed exchange. Service registration, heartbeats, and distributed configuration management.

Production Metrics

These are real metrics from validation testing, not marketing claims.

Detection Accuracy

97.6% accuracy on CTU-13 Neris botnet (real ransomware)
36,000 events tested with zero crypto errors
17 hours continuous operation (61,343 seconds)
2,080,549 packets processed successfully

Performance

364 events/sec peak throughput under stress
<1 μs normal traffic latency
54% CPU maximum under extreme load (20K events)
127 MB RAM memory footprint under stress
0 crashes during validation period

Reliability

0 crypto errors @ 36K events
0 decompression errors @ 36K events
0 protobuf parse errors @ 36K events
Graceful degradation when capacity exceeded

System Architecture

Data Flow

Network Traffic → eBPF/XDP captures packets in kernel space
Sniffer → Extracts 512-byte payloads + 83 ML features
ML Detector → 4 RandomForest models classify threats
Crypto Pipeline → ChaCha20-Poly1305 encryption + LZ4 compression
etcd Server → Coordinates services, manages crypto keys
Firewall Agent → Autonomous blocking via IPSet/IPTables
RAG Ingester → Parses logs, generates embeddings
RAG System → Natural language queries over threat data

Quick Links

Quickstart

Get ML Defender running in 15 minutes

Architecture

Deep dive into system components and data flow

Philosophy

Via Appia Quality and design principles

Deployment Modes

Host-Based IDS

Captures packets destined to the defender host itself. Traditional intrusion detection system mode.

Interface: eth1 (WAN-facing)
XDP ifindex: 3
Use case: Server protection, endpoint security

Gateway Mode

Captures packets flowing through the defender as a network gateway. Dual-NIC deployment for network-wide protection.

Interface: eth3 (LAN-facing)
XDP ifindex: 5
Use case: Network gateway, firewall appliance
Topology: Client → Gateway (ML Defender) → Internet

Gateway mode requires dual-NIC configuration and IP forwarding enabled.

Threat Coverage

Protected Against

DDoS Attacks

Volumetric, protocol, and application-layer DDoS detection using behavioral analysis. Detection based on:

External IP velocity (>10 new IPs in 10s)
Port scanning patterns (>15 unique ports)
RST ratio analysis (>30% aggressive connections)
Packet rate anomalies

Ransomware

Three-layer ransomware detection system:

Layer 0: 512-byte payload capture in eBPF
Layer 1.5: Shannon entropy analysis (>7.0 bits = encrypted)
Layer 1: Fast heuristics (10s window for C&C, SMB lateral movement)
Layer 2: 20 ransomware features (30s aggregation)

Validated on CTU-13 Neris botnet with 97.6% accuracy.

Port Scanning

Detection of reconnaissance activities via:

Unique port tracking per source IP
Connection attempt velocity
SYN flood patterns

Malicious IPs

Autonomous blocking with temporal rules:

IPSet kernel-level enforcement
Configurable expiration (default: 1 hour)
Whitelist/blacklist support
Rate limiting per IP

Known Limitations

Scientific honesty: We document what we don’t protect against.

Zero-day exploits: No signature-based detection
Encrypted malware payloads: TLS/SSL content is opaque
Insider threats: No authentication/authorization layer
Physical attacks: Out of scope
IPSet capacity: Maximum ~500K IPs (requires multi-tier storage)

Technology Stack

Core Technologies

C++20: Modern C++ for performance-critical components
eBPF/XDP: Kernel-space packet capture
RandomForest: 4 embedded ML models (97.6% accuracy)
ZeroMQ: Inter-process communication (PUB/SUB pattern)
Protobuf: Message serialization

Security

ChaCha20-Poly1305: AEAD encryption for threat data
LZ4: Fast compression (zero errors @ 36K events)
IPSet/IPTables: Kernel-level packet filtering
libsodium: Crypto primitives

Distributed Systems

etcd: Service discovery, configuration management
etcd v3 API: Key-value store with watch support

AI/ML

TinyLlama: Lightweight language model for queries
FAISS: Vector similarity search
ONNX Runtime: Neural network inference

Validation & Testing

Datasets

CTU-13 Neris Botnet: Real ransomware behavior (97.6% accuracy)
Synthetic Traffic: Custom DDoS pattern generator
MAWI Dataset: Real network captures for gateway mode
17-hour stress test: 2.08M packets, zero crashes

Test Coverage

Unit tests: Core algorithms and data structures
Integration tests: End-to-end pipeline validation
Stress tests: 36K events across 4 progressive load tests
Chaos tests: Component failure scenarios

All test results are documented with actual metrics, not aspirational goals.

Community & Support

Contributing

ML Defender welcomes contributions! We practice transparent AI collaboration. Guidelines:

Scientific honesty - Report real results, acknowledge limitations
AI transparency - Credit AI assistants used in development
Testing required - All changes must include tests
Documentation - Update docs with code changes
Via Appia Quality - Build for decades, not quarters

AI Co-Authors

This project practices “Consejo de Sabios” (Council of Wise Ones):

Claude (Anthropic) - Architecture design, code review, documentation
DeepSeek - Algorithm optimization, debugging
Grok (xAI) - Performance analysis, XDP expertise
ChatGPT (OpenAI) - Research assistance
Qwen (Alibaba) - Documentation review, routing verification

All AI contributions are explicitly credited in code comments and commit messages.

License

MIT License - See LICENSE for details.

Next Steps

Get Started

Follow the Quickstart Guide to deploy ML Defender in 15 minutes.

Learn the Architecture

Read the Architecture Guide to understand system components and data flow.

Understand the Philosophy

Explore Via Appia Quality and our design principles.

Deploy to Production

Review component documentation and deployment guides in the Components section.

Via Appia Quality 🏛️ - Built to last decades “The road to security is long, but we build it to endure.”

Overview

Getting Started

Components

Operations

Security

​Mission

​Key Features

eBPF/XDP Packet Capture

Multi-Layer ML Detection

Autonomous Blocking

Encrypted Pipeline

RAG Intelligence

Distributed Coordination

​Production Metrics

​Detection Accuracy

​Performance

​Reliability

​System Architecture

​Data Flow

​Quick Links

Quickstart

Architecture

Philosophy

​Deployment Modes

​Host-Based IDS

​Gateway Mode

​Threat Coverage

​Protected Against

​Known Limitations

​Technology Stack

​Core Technologies

​Security

​Distributed Systems

​AI/ML

​Validation & Testing

​Datasets

​Test Coverage

​Community & Support

​Contributing

​AI Co-Authors

​License

​Next Steps

Build docs developers (and LLMs) love

Mission

Key Features

Production Metrics

Detection Accuracy

Performance

Reliability

System Architecture

Data Flow

Quick Links

Deployment Modes

Host-Based IDS

Gateway Mode

Threat Coverage

Protected Against

Known Limitations

Technology Stack

Core Technologies

Security

Distributed Systems

AI/ML

Validation & Testing

Datasets

Test Coverage

Community & Support

Contributing

AI Co-Authors

License

Next Steps