Skip to main content

Via Appia Quality

Via Appia - Roman road built to last centuries Via Appia Quality is our north star: building systems like Roman roads, designed to endure for decades, not quarters. The Appian Way (Via Appia) was constructed in 312 BC and remains largely intact today—over 2,300 years later. This is the standard we aspire to in software engineering.

Core Tenets

Scientific Honesty

Report actual results, not inflated claims. Document limitations as prominently as capabilities. No marketing speak in technical docs.

Methodical Development

Validate each component before proceeding. No moving goalposts. Real metrics from stress testing, not aspirational targets.

Transparent AI Collaboration

Credit all AI systems as co-authors. Disclose methodology for academic integrity. AI as partners, not mere tools.

User Privacy

No telemetry, no tracking, no data exfiltration. User data stays on user infrastructure. Open source for auditability.

What This Means in Practice

We measure success by longevity, not velocity:
  • Would this code survive a 10-year maintenance gap?
  • Can a new developer understand it in 6 months?
  • Will this architecture scale to 100x traffic?
We document failures as openly as successes:
  • Known limitations section in every component README
  • Actual test results (not cherry-picked)
  • Post-mortems for all incidents
We build for the long tail:
  • Compatibility over novelty
  • Standard protocols over custom formats
  • Graceful degradation over fail-fast
“The code you write today should be readable by your successor in 2035.” - Via Appia Principle #1

Scientific Honesty

Real Metrics, Not Marketing

ML Defender reports actual performance from validation testing: We report:
  • 97.6% accuracy on CTU-13 Neris botnet (real ransomware dataset)
  • 17 hours continuous operation without crashes
  • 36,000 events tested with 0 crypto errors
  • 364 events/sec peak throughput (stress test)
  • IPSet capacity limit hit at ~1,000 IPs (documented limitation)
We don’t claim:
  • “99.9% accuracy” (no dataset specified)
  • “Millions of events/sec” (unrealistic without hardware details)
  • “Enterprise-scale” (without defining scale)
  • “AI-powered” (without describing the models)

Transparent Limitations

Every component documentation includes a Known Limitations section: Example from Firewall Agent:
Known Limitations (Day 52):
- IPSet capacity finite (max realistic: 500K IPs)
- No persistence layer yet (evicted IPs lost on restart)
- Single-node deployment (no HA/failover)
- Manual capacity management required
Example from ML Detector:
Does NOT Protect Against:
- Zero-day exploits (no signature-based detection)
- Encrypted malware payloads (TLS/SSL content is opaque)
- Insider threats (no authentication/authorization layer)
- Physical attacks (out of scope)
If a component doesn’t document its limitations, it’s not production-ready.

Reproducible Results

All performance claims include:
  1. Hardware specs: CPU cores, RAM, network
  2. Test methodology: Synthetic vs organic, duration, load profile
  3. Raw data: Logs, screenshots, packet captures
  4. Scripts: Reproducible test harnesses in /scripts/
Example - 17-Hour Stability Test:
# Exact test configuration
Vagrant VM: Debian 12, 6 CPU, 8GB RAM
Kernel: 6.1.0
Test script: /vagrant/scripts/stress_test_8h.sh
Dataset: Mixed synthetic + organic traffic
Duration: 17h 2m 10s (61,343 seconds)
Results: /vagrant/logs/stress_test_20251120_101056/
Anyone can reproduce our tests and verify claims.

Collaborative AI Development

”Consejo de Sabios” (Council of Wise Ones)

ML Defender is built using transparent AI collaboration:
1

Multiple AI Systems

Claude (Anthropic), DeepSeek, Grok (xAI), ChatGPT (OpenAI), Qwen (Alibaba) all contribute to development.
2

Peer Review Process

Each AI reviews code written by others. Cross-validation catches bugs and design flaws early.
3

Explicit Attribution

All AI contributions credited in code comments, commit messages, and documentation.
4

Academic Integrity

Transparent methodology for research papers. AI as co-authors, not ghostwriters.

AI Co-Authors

Claude (Anthropic):
  • Architecture design
  • Code review and debugging
  • Documentation synthesis
  • Integration work across components
DeepSeek:
  • Algorithm optimization
  • Performance profiling
  • Debugging race conditions
  • Vagrantfile automation
Grok (xAI):
  • XDP/eBPF expertise
  • Performance analysis
  • Chaos testing (chaos_monkey.sh)
  • Gateway mode validation
ChatGPT (OpenAI):
  • Research assistance
  • Literature review
  • Hospital stress test specs
  • Dataset recommendations
Qwen (Alibaba):
  • Documentation review
  • Routing verification (rp_filter edge case)
  • Configuration validation
  • Multi-language support (Spanish)

Why This Matters

For Research:
  • Reproducibility - Others can use same AI workflow
  • Academic honesty - No hidden AI contributions
  • Methodology transparency - Peer reviewers understand development process
For Users:
  • Trust - Nothing hidden about how code was developed
  • Quality - Multiple AI perspectives catch more bugs
  • Education - Learn how to collaborate with AI effectively
“If an AI writes code, and you don’t credit them, that’s plagiarism. Simple.” - ML Defender Contributor Guidelines

Example: Git Commit Attribution

# Good commit message
feat(firewall): Add IPSet capacity monitoring

Implemented circuit breaker pattern to prevent queue overflow
when IPSet reaches capacity. Graceful degradation logs warnings
but continues processing.

Co-authored-by: Claude (Anthropic)
Co-authored-by: DeepSeek
Validated-by: Grok (xAI)

# Bad commit message
feat: add monitoring
# (Who wrote this? What does it monitor? Any AI help?)

User Privacy & Accessibility

No Telemetry, Ever

ML Defender does not:
  • ❌ Phone home with usage statistics
  • ❌ Send crash reports to external servers
  • ❌ Track user behavior
  • ❌ Require registration or license keys
  • ❌ Upload threat data to cloud
ML Defender does:
  • ✅ Keep all data on user infrastructure
  • ✅ Use standard syslog/file logging
  • ✅ Allow users to delete logs anytime
  • ✅ Support air-gapped deployments
  • ✅ Open source for auditability

Accessibility

Documentation for Non-Experts:
  • Natural language explanations, not just technical jargon
  • Examples before theory
  • Troubleshooting guides with actual error messages
  • “Why” before “How” in architecture docs
Example - RAG System Queries:
Instead of: "Execute vector similarity search with FAISS IVF index"
We say:     "Ask in plain language: '¿Qué ha ocurrido en las últimas 24h?'"
Deployment Options:
  • Single-node deployment (small organizations)
  • Raspberry Pi compatible (low-budget schools)
  • Multi-node clustering (enterprises)
  • Docker/Kubernetes (cloud-native)
Good security software should be accessible to small hospitals, not just Fortune 500 companies.

Design Principles

1. Config-Driven Architecture

Principle: “JSON is law” - Zero hardcoded values Bad (hardcoded):
const std::string log_path = "/var/log/firewall.log";
const int batch_size = 100;
const std::string ipset_name = "ml_defender_blacklist";
Good (config-driven):
const auto log_path = config["logging"]["file"].asString();
const auto batch_size = config["batch_processor"]["batch_size"].asInt();
const auto ipset_name = config["ipsets"]["blacklist_test"]["name"].asString();
Benefits:
  • Change behavior without recompilation
  • A/B testing via config changes
  • Multi-environment deployment (dev/staging/prod)
  • Clear documentation of all tunables

2. Graceful Degradation

Principle: “Better 3/4 systems working than 0/4” Example - RAG Ingester:
  • 4 FAISS indices (Chronos, SBERT, Entity Benign, Entity Malicious)
  • If 1 index fails → Continue with 3 indices
  • Circuit breaker prevents cascade failures
  • Health monitoring tracks coefficient of variation (CV)
Example - Firewall Agent:
  • IPSet capacity exceeded → Log warnings, continue processing
  • Crypto error on 1 message → Skip message, continue with next
  • etcd unavailable → Use cached config, retry with exponential backoff
Fail-fast is good for development. Graceful degradation is mandatory for production.

3. Observability First

Principle: “If you can’t measure it, you can’t improve it” Every component exports:
  • Structured logs (JSON format)
  • Metrics (packets processed, errors, latency)
  • Health checks (for orchestration)
  • Performance stats (every 30 seconds)
Example - Sniffer Statistics:
=== ESTADÍSTICAS ===
Paquetes procesados: 2080549
Paquetes enviados: 0
Tiempo activo: 61343 segundos
Tasa: 33.92 eventos/seg
Payloads sospechosos: 1550375
===================
Future:
  • Prometheus metrics exporter
  • Grafana dashboards
  • Alert manager integration
  • Distributed tracing (OpenTelemetry)

4. Testing as First-Class Citizen

Principle: “Untested code is legacy code at birth” Test Pyramid:
  1. Unit Tests (fast, isolated)
    • 25+ tests for sniffer components
    • Test coverage >80% for core logic
  2. Integration Tests (realistic, end-to-end)
    • Full pipeline tests (sniffer → ml → firewall)
    • Crypto pipeline validation
  3. Stress Tests (production-like load)
    • 36,000 events across 4 progressive tests
    • 17-hour stability test
  4. Chaos Tests (failure scenarios)
    • Component crashes
    • Network partitions
    • Resource exhaustion
All PRs require:
  • Passing unit tests
  • Integration test coverage
  • Performance benchmarks (no regressions)

5. Via Appia Refactoring

Principle: “Refactor for the next decade, not the next sprint” When we refactor:
  • Not for style (unless affecting readability)
  • Not for novelty (latest C++ features)
  • Only for:
    • Eliminating hardcoded values
    • Fixing architectural debt
    • Improving observability
    • Reducing coupling
Example - Day 52 Refactoring:
Problem: Hardcoded logger paths, duplicate config, IPSet singleton
Solution: 
  - Logger path from config.logging.file
  - IPSet names from config.ipsets map
  - Removed BatchProcessor struct defaults
  - Single source of truth (JSON)

Mission: Democratizing Cybersecurity

The Problem

Hospitals, schools, and small organizations face:
  • Ransomware attacks (average: $4.5M per incident)
  • Expensive commercial security products (50K50K-500K/year)
  • Lack of in-house security expertise
  • Compliance requirements (HIPAA, FERPA)
They need:
  • Enterprise-grade protection
  • Affordable (ideally free/open source)
  • Easy to deploy and maintain
  • Transparent (no black boxes)

Our Solution

ML Defender provides:
  • ✅ Open source (MIT license)
  • ✅ Production-ready (36K events tested)
  • ✅ Self-hosted (no cloud dependencies)
  • ✅ Documented honestly (real metrics)
  • ✅ Raspberry Pi compatible (~$100 hardware)
Target Deployments:
  1. Small Hospital (50-200 beds)
    • Single-node deployment
    • Protects electronic health records (EHR)
    • Blocks ransomware C&C communication
    • Cost: $500 hardware + IT staff time
  2. School District (5,000 students)
    • Gateway mode deployment
    • Protects entire network
    • Parental transparency (no student tracking)
    • Cost: $2,000 hardware + IT staff time
  3. Non-Profit (Remote clinic)
    • Raspberry Pi deployment
    • Offline operation (no Internet required)
    • Low power consumption
    • Cost: $100 hardware + volunteer time
“If only Fortune 500 companies can afford security, we’ve failed as an industry.” - ML Defender Mission Statement

Academic Integrity

For Research Papers

When publishing research about ML Defender:
  1. AI Collaboration Disclosure:
    This work was developed in collaboration with multiple AI systems
    (Claude, DeepSeek, Grok, ChatGPT, Qwen). All AI contributions are
    explicitly credited in code comments and commit history.
    
  2. Methodology Transparency:
    • Disclose which components were AI-assisted
    • Document human review process
    • Provide reproducible test harnesses
  3. Dataset Attribution:
    • CTU-13 dataset (Czech Technical University)
    • MAWI dataset (WIDE Project)
    • Synthetic traffic (custom generator, open sourced)
  4. Code Availability:
    • GitHub repository link
    • Docker images for reproducibility
    • VM images (Vagrant) for exact environment

For Academic Use

Students and researchers may:
  • Use ML Defender in thesis/dissertation work
  • Extend components for research projects
  • Benchmark against other IDS systems
  • Publish comparative analyses
We request:
  • Cite the project with AI co-author disclosure
  • Share improvements back (via PRs)
  • Document limitations honestly
  • Use real metrics (not cherry-picked results)

Future Vision

5-Year Roadmap

Year 1 (Current):
  • Production-ready core (sniffer, ml-detector, firewall)
  • 97.6% accuracy on ransomware
  • Single-node deployment validated
Year 2:
  • Multi-node clustering
  • Kubernetes deployment
  • Prometheus/Grafana observability
  • 10+ hospital pilots
Year 3:
  • Federated learning (privacy-preserving model updates)
  • Community threat intel sharing
  • 100+ production deployments
  • Published research papers
Year 4:
  • Hardware appliance (turnkey deployment)
  • Professional support offering
  • Certification programs (training)
  • International deployments
Year 5:
  • Standard reference architecture (NIST)
  • Integration with major EHR systems
  • 1,000+ deployments
  • Self-sustaining community
Via Appia Quality means planning for the long term. We’re building for 2030, not 2026.

Contributing to Via Appia Quality

How You Can Help

Code Contributions:
  • Follow existing patterns (config-driven, graceful degradation)
  • Include tests (unit + integration)
  • Document limitations honestly
  • Disclose AI assistance (if any)
Documentation:
  • Natural language for accessibility
  • Real examples from production
  • Troubleshooting guides
  • Translations (Spanish, others)
Testing:
  • Deploy in your environment
  • Report actual performance metrics
  • Share failure scenarios
  • Contribute test datasets
Community:
  • Help others on GitHub Discussions
  • Write blog posts about deployments
  • Present at conferences
  • Mentor students

Contribution Guidelines

1

Scientific Honesty

Report real results. Acknowledge limitations. No marketing speak.
2

AI Transparency

Credit AI assistants used. Disclose methodology.
3

Testing Required

All changes must include tests. No exceptions.
4

Documentation

Update docs with code changes. Examples required.
5

Via Appia Quality

Build for decades, not quarters. Think long-term.

Via Appia Quality 🏛️ - Built to last decades “The road to security is long, but we build it to endure.”

Build docs developers (and LLMs) love