Via Appia Quality
Via Appia Quality is our north star: building systems like Roman roads, designed to endure for decades, not quarters.
The Appian Way (Via Appia) was constructed in 312 BC and remains largely intact today—over 2,300 years later. This is the standard we aspire to in software engineering.
Core Tenets
Scientific Honesty
Report actual results, not inflated claims. Document limitations as prominently as capabilities. No marketing speak in technical docs.
Methodical Development
Validate each component before proceeding. No moving goalposts. Real metrics from stress testing, not aspirational targets.
Transparent AI Collaboration
Credit all AI systems as co-authors. Disclose methodology for academic integrity. AI as partners, not mere tools.
User Privacy
No telemetry, no tracking, no data exfiltration. User data stays on user infrastructure. Open source for auditability.
What This Means in Practice
We measure success by longevity, not velocity:- Would this code survive a 10-year maintenance gap?
- Can a new developer understand it in 6 months?
- Will this architecture scale to 100x traffic?
- Known limitations section in every component README
- Actual test results (not cherry-picked)
- Post-mortems for all incidents
- Compatibility over novelty
- Standard protocols over custom formats
- Graceful degradation over fail-fast
“The code you write today should be readable by your successor in 2035.” - Via Appia Principle #1
Scientific Honesty
Real Metrics, Not Marketing
ML Defender reports actual performance from validation testing: ✅ We report:- 97.6% accuracy on CTU-13 Neris botnet (real ransomware dataset)
- 17 hours continuous operation without crashes
- 36,000 events tested with 0 crypto errors
- 364 events/sec peak throughput (stress test)
- IPSet capacity limit hit at ~1,000 IPs (documented limitation)
- “99.9% accuracy” (no dataset specified)
- “Millions of events/sec” (unrealistic without hardware details)
- “Enterprise-scale” (without defining scale)
- “AI-powered” (without describing the models)
Transparent Limitations
Every component documentation includes a Known Limitations section: Example from Firewall Agent:Reproducible Results
All performance claims include:- Hardware specs: CPU cores, RAM, network
- Test methodology: Synthetic vs organic, duration, load profile
- Raw data: Logs, screenshots, packet captures
- Scripts: Reproducible test harnesses in
/scripts/
Collaborative AI Development
”Consejo de Sabios” (Council of Wise Ones)
ML Defender is built using transparent AI collaboration:Multiple AI Systems
Claude (Anthropic), DeepSeek, Grok (xAI), ChatGPT (OpenAI), Qwen (Alibaba) all contribute to development.
Peer Review Process
Each AI reviews code written by others. Cross-validation catches bugs and design flaws early.
Explicit Attribution
All AI contributions credited in code comments, commit messages, and documentation.
AI Co-Authors
Claude (Anthropic):- Architecture design
- Code review and debugging
- Documentation synthesis
- Integration work across components
- Algorithm optimization
- Performance profiling
- Debugging race conditions
- Vagrantfile automation
- XDP/eBPF expertise
- Performance analysis
- Chaos testing (chaos_monkey.sh)
- Gateway mode validation
- Research assistance
- Literature review
- Hospital stress test specs
- Dataset recommendations
- Documentation review
- Routing verification (rp_filter edge case)
- Configuration validation
- Multi-language support (Spanish)
Why This Matters
For Research:- Reproducibility - Others can use same AI workflow
- Academic honesty - No hidden AI contributions
- Methodology transparency - Peer reviewers understand development process
- Trust - Nothing hidden about how code was developed
- Quality - Multiple AI perspectives catch more bugs
- Education - Learn how to collaborate with AI effectively
“If an AI writes code, and you don’t credit them, that’s plagiarism. Simple.” - ML Defender Contributor Guidelines
Example: Git Commit Attribution
User Privacy & Accessibility
No Telemetry, Ever
ML Defender does not:- ❌ Phone home with usage statistics
- ❌ Send crash reports to external servers
- ❌ Track user behavior
- ❌ Require registration or license keys
- ❌ Upload threat data to cloud
- ✅ Keep all data on user infrastructure
- ✅ Use standard syslog/file logging
- ✅ Allow users to delete logs anytime
- ✅ Support air-gapped deployments
- ✅ Open source for auditability
Accessibility
Documentation for Non-Experts:- Natural language explanations, not just technical jargon
- Examples before theory
- Troubleshooting guides with actual error messages
- “Why” before “How” in architecture docs
- Single-node deployment (small organizations)
- Raspberry Pi compatible (low-budget schools)
- Multi-node clustering (enterprises)
- Docker/Kubernetes (cloud-native)
Design Principles
1. Config-Driven Architecture
Principle: “JSON is law” - Zero hardcoded values ❌ Bad (hardcoded):- Change behavior without recompilation
- A/B testing via config changes
- Multi-environment deployment (dev/staging/prod)
- Clear documentation of all tunables
2. Graceful Degradation
Principle: “Better 3/4 systems working than 0/4” Example - RAG Ingester:- 4 FAISS indices (Chronos, SBERT, Entity Benign, Entity Malicious)
- If 1 index fails → Continue with 3 indices
- Circuit breaker prevents cascade failures
- Health monitoring tracks coefficient of variation (CV)
- IPSet capacity exceeded → Log warnings, continue processing
- Crypto error on 1 message → Skip message, continue with next
- etcd unavailable → Use cached config, retry with exponential backoff
3. Observability First
Principle: “If you can’t measure it, you can’t improve it” Every component exports:- Structured logs (JSON format)
- Metrics (packets processed, errors, latency)
- Health checks (for orchestration)
- Performance stats (every 30 seconds)
- Prometheus metrics exporter
- Grafana dashboards
- Alert manager integration
- Distributed tracing (OpenTelemetry)
4. Testing as First-Class Citizen
Principle: “Untested code is legacy code at birth” Test Pyramid:- Unit Tests (fast, isolated)
- 25+ tests for sniffer components
- Test coverage >80% for core logic
- Integration Tests (realistic, end-to-end)
- Full pipeline tests (sniffer → ml → firewall)
- Crypto pipeline validation
- Stress Tests (production-like load)
- 36,000 events across 4 progressive tests
- 17-hour stability test
- Chaos Tests (failure scenarios)
- Component crashes
- Network partitions
- Resource exhaustion
- Passing unit tests
- Integration test coverage
- Performance benchmarks (no regressions)
5. Via Appia Refactoring
Principle: “Refactor for the next decade, not the next sprint” When we refactor:- Not for style (unless affecting readability)
- Not for novelty (latest C++ features)
- Only for:
- Eliminating hardcoded values
- Fixing architectural debt
- Improving observability
- Reducing coupling
Mission: Democratizing Cybersecurity
The Problem
Hospitals, schools, and small organizations face:- Ransomware attacks (average: $4.5M per incident)
- Expensive commercial security products (500K/year)
- Lack of in-house security expertise
- Compliance requirements (HIPAA, FERPA)
- Enterprise-grade protection
- Affordable (ideally free/open source)
- Easy to deploy and maintain
- Transparent (no black boxes)
Our Solution
ML Defender provides:- ✅ Open source (MIT license)
- ✅ Production-ready (36K events tested)
- ✅ Self-hosted (no cloud dependencies)
- ✅ Documented honestly (real metrics)
- ✅ Raspberry Pi compatible (~$100 hardware)
- Small Hospital (50-200 beds)
- Single-node deployment
- Protects electronic health records (EHR)
- Blocks ransomware C&C communication
- Cost: $500 hardware + IT staff time
- School District (5,000 students)
- Gateway mode deployment
- Protects entire network
- Parental transparency (no student tracking)
- Cost: $2,000 hardware + IT staff time
- Non-Profit (Remote clinic)
- Raspberry Pi deployment
- Offline operation (no Internet required)
- Low power consumption
- Cost: $100 hardware + volunteer time
“If only Fortune 500 companies can afford security, we’ve failed as an industry.” - ML Defender Mission Statement
Academic Integrity
For Research Papers
When publishing research about ML Defender:-
AI Collaboration Disclosure:
-
Methodology Transparency:
- Disclose which components were AI-assisted
- Document human review process
- Provide reproducible test harnesses
-
Dataset Attribution:
- CTU-13 dataset (Czech Technical University)
- MAWI dataset (WIDE Project)
- Synthetic traffic (custom generator, open sourced)
-
Code Availability:
- GitHub repository link
- Docker images for reproducibility
- VM images (Vagrant) for exact environment
For Academic Use
Students and researchers may:- Use ML Defender in thesis/dissertation work
- Extend components for research projects
- Benchmark against other IDS systems
- Publish comparative analyses
- Cite the project with AI co-author disclosure
- Share improvements back (via PRs)
- Document limitations honestly
- Use real metrics (not cherry-picked results)
Future Vision
5-Year Roadmap
Year 1 (Current):- Production-ready core (sniffer, ml-detector, firewall)
- 97.6% accuracy on ransomware
- Single-node deployment validated
- Multi-node clustering
- Kubernetes deployment
- Prometheus/Grafana observability
- 10+ hospital pilots
- Federated learning (privacy-preserving model updates)
- Community threat intel sharing
- 100+ production deployments
- Published research papers
- Hardware appliance (turnkey deployment)
- Professional support offering
- Certification programs (training)
- International deployments
- Standard reference architecture (NIST)
- Integration with major EHR systems
- 1,000+ deployments
- Self-sustaining community
Contributing to Via Appia Quality
How You Can Help
Code Contributions:- Follow existing patterns (config-driven, graceful degradation)
- Include tests (unit + integration)
- Document limitations honestly
- Disclose AI assistance (if any)
- Natural language for accessibility
- Real examples from production
- Troubleshooting guides
- Translations (Spanish, others)
- Deploy in your environment
- Report actual performance metrics
- Share failure scenarios
- Contribute test datasets
- Help others on GitHub Discussions
- Write blog posts about deployments
- Present at conferences
- Mentor students
Contribution Guidelines
Via Appia Quality 🏛️ - Built to last decades “The road to security is long, but we build it to endure.”