Skip to main content
ML Defender Hero Light

Mission

Democratize enterprise-grade cybersecurity for hospitals, schools, and small organizations that cannot afford commercial solutions. Built to last decades with scientific honesty and methodical development. Philosophy: Via Appia Quality – Systems built like Roman roads, designed to endure.

Key Features

eBPF/XDP Packet Capture

High-performance kernel-space packet filtering with 512-byte payload capture. Zero-copy design with ring buffer delivery to userspace. Captures 2M+ packets over 17 hours with zero crashes.

Multi-Layer ML Detection

4 embedded RandomForest models with 97.6% accuracy on real malware (CTU-13 Neris botnet). Sub-microsecond detection latency with 83-feature extraction pipeline.

Autonomous Blocking

Kernel-level blocking via IPSet/IPTables with sub-10ms response time. Tested at 364 events/sec with graceful degradation. Config-driven architecture with zero hardcoding.

Encrypted Pipeline

ChaCha20-Poly1305 authenticated encryption with LZ4 compression. Production-validated with 36,000 events and zero crypto errors.

RAG Intelligence

Natural language forensic queries powered by TinyLlama and FAISS vector search. Multi-index strategy with eventual consistency for high availability.

Distributed Coordination

etcd-based service discovery with automatic crypto seed exchange. Service registration, heartbeats, and distributed configuration management.

Production Metrics

These are real metrics from validation testing, not marketing claims.

Detection Accuracy

  • 97.6% accuracy on CTU-13 Neris botnet (real ransomware)
  • 36,000 events tested with zero crypto errors
  • 17 hours continuous operation (61,343 seconds)
  • 2,080,549 packets processed successfully

Performance

  • 364 events/sec peak throughput under stress
  • <1 μs normal traffic latency
  • 54% CPU maximum under extreme load (20K events)
  • 127 MB RAM memory footprint under stress
  • 0 crashes during validation period

Reliability

  • 0 crypto errors @ 36K events
  • 0 decompression errors @ 36K events
  • 0 protobuf parse errors @ 36K events
  • Graceful degradation when capacity exceeded

System Architecture

Data Flow

  1. Network Traffic → eBPF/XDP captures packets in kernel space
  2. Sniffer → Extracts 512-byte payloads + 83 ML features
  3. ML Detector → 4 RandomForest models classify threats
  4. Crypto Pipeline → ChaCha20-Poly1305 encryption + LZ4 compression
  5. etcd Server → Coordinates services, manages crypto keys
  6. Firewall Agent → Autonomous blocking via IPSet/IPTables
  7. RAG Ingester → Parses logs, generates embeddings
  8. RAG System → Natural language queries over threat data

Quickstart

Get ML Defender running in 15 minutes

Architecture

Deep dive into system components and data flow

Philosophy

Via Appia Quality and design principles

Deployment Modes

Host-Based IDS

Captures packets destined to the defender host itself. Traditional intrusion detection system mode.
  • Interface: eth1 (WAN-facing)
  • XDP ifindex: 3
  • Use case: Server protection, endpoint security

Gateway Mode

Captures packets flowing through the defender as a network gateway. Dual-NIC deployment for network-wide protection.
  • Interface: eth3 (LAN-facing)
  • XDP ifindex: 5
  • Use case: Network gateway, firewall appliance
  • Topology: Client → Gateway (ML Defender) → Internet
Gateway mode requires dual-NIC configuration and IP forwarding enabled.

Threat Coverage

Protected Against

Volumetric, protocol, and application-layer DDoS detection using behavioral analysis. Detection based on:
  • External IP velocity (>10 new IPs in 10s)
  • Port scanning patterns (>15 unique ports)
  • RST ratio analysis (>30% aggressive connections)
  • Packet rate anomalies
Three-layer ransomware detection system:
  • Layer 0: 512-byte payload capture in eBPF
  • Layer 1.5: Shannon entropy analysis (>7.0 bits = encrypted)
  • Layer 1: Fast heuristics (10s window for C&C, SMB lateral movement)
  • Layer 2: 20 ransomware features (30s aggregation)
Validated on CTU-13 Neris botnet with 97.6% accuracy.
Detection of reconnaissance activities via:
  • Unique port tracking per source IP
  • Connection attempt velocity
  • SYN flood patterns
Autonomous blocking with temporal rules:
  • IPSet kernel-level enforcement
  • Configurable expiration (default: 1 hour)
  • Whitelist/blacklist support
  • Rate limiting per IP

Known Limitations

Scientific honesty: We document what we don’t protect against.
  • Zero-day exploits: No signature-based detection
  • Encrypted malware payloads: TLS/SSL content is opaque
  • Insider threats: No authentication/authorization layer
  • Physical attacks: Out of scope
  • IPSet capacity: Maximum ~500K IPs (requires multi-tier storage)

Technology Stack

Core Technologies

  • C++20: Modern C++ for performance-critical components
  • eBPF/XDP: Kernel-space packet capture
  • RandomForest: 4 embedded ML models (97.6% accuracy)
  • ZeroMQ: Inter-process communication (PUB/SUB pattern)
  • Protobuf: Message serialization

Security

  • ChaCha20-Poly1305: AEAD encryption for threat data
  • LZ4: Fast compression (zero errors @ 36K events)
  • IPSet/IPTables: Kernel-level packet filtering
  • libsodium: Crypto primitives

Distributed Systems

  • etcd: Service discovery, configuration management
  • etcd v3 API: Key-value store with watch support

AI/ML

  • TinyLlama: Lightweight language model for queries
  • FAISS: Vector similarity search
  • ONNX Runtime: Neural network inference

Validation & Testing

Datasets

  • CTU-13 Neris Botnet: Real ransomware behavior (97.6% accuracy)
  • Synthetic Traffic: Custom DDoS pattern generator
  • MAWI Dataset: Real network captures for gateway mode
  • 17-hour stress test: 2.08M packets, zero crashes

Test Coverage

  • Unit tests: Core algorithms and data structures
  • Integration tests: End-to-end pipeline validation
  • Stress tests: 36K events across 4 progressive load tests
  • Chaos tests: Component failure scenarios
All test results are documented with actual metrics, not aspirational goals.

Community & Support

Contributing

ML Defender welcomes contributions! We practice transparent AI collaboration. Guidelines:
  1. Scientific honesty - Report real results, acknowledge limitations
  2. AI transparency - Credit AI assistants used in development
  3. Testing required - All changes must include tests
  4. Documentation - Update docs with code changes
  5. Via Appia Quality - Build for decades, not quarters

AI Co-Authors

This project practices “Consejo de Sabios” (Council of Wise Ones):
  • Claude (Anthropic) - Architecture design, code review, documentation
  • DeepSeek - Algorithm optimization, debugging
  • Grok (xAI) - Performance analysis, XDP expertise
  • ChatGPT (OpenAI) - Research assistance
  • Qwen (Alibaba) - Documentation review, routing verification
All AI contributions are explicitly credited in code comments and commit messages.

License

MIT License - See LICENSE for details.

Next Steps

1

Get Started

Follow the Quickstart Guide to deploy ML Defender in 15 minutes.
2

Learn the Architecture

Read the Architecture Guide to understand system components and data flow.
3

Understand the Philosophy

Explore Via Appia Quality and our design principles.
4

Deploy to Production

Review component documentation and deployment guides in the Components section.

Via Appia Quality 🏛️ - Built to last decades “The road to security is long, but we build it to endure.”

Build docs developers (and LLMs) love