Known Limitations

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

Overview
IPSet Capacity Limits
Maximum Realistic Capacity
Capacity Exceeded Behavior
Mitigation Strategies
Performance Impact of Large IPSets
Single-Node Deployment
No High Availability (Yet)
Workarounds for High Availability
Roadmap for High Availability (Priority 2)
No Persistence Layer
Evicted IPs Are Lost
Impact on Security Operations
Workarounds
Roadmap for Persistence (Priority 1.1)
Manual Capacity Management
No Automatic Monitoring
Manual Capacity Management Tasks
Roadmap for Automatic Management (Priority 1.3)
Other Known Limitations
Performance Constraints
Protocol Limitations
Configuration Complexity
Deployment Recommendations
Capacity Planning Guidelines
Monitoring Requirements
Roadmap Summary
Priority 1: Production Scale (2 weeks)
Priority 2: Observability (1 week)
Priority 3: High Availability (2 weeks)
Conclusion

Overview

ML Defender is a production-ready system with proven stability (17-hour stress test, 2M+ packets processed). However, like all security systems, it has specific architectural limitations that must be understood for proper deployment planning.

Understanding limitations is critical for security architecture. Deploy ML Defender as part of a layered defense strategy, not as a standalone solution.

IPSet Capacity Limits

Maximum Realistic Capacity

ML Defender uses IPSet hash tables for kernel-level IP blocking: Theoretical limits:

Linux kernel IPSet: 65,535 entries per set (hard limit)
ML Defender default: 1,000 IPs (conservative for testing)
Maximum realistic: 500,000 IPs (with performance tuning)

Current configuration:

// firewall-acl-agent/config/firewall.json
{
  "ipsets": {
    "blacklist": {
      "set_name": "ml_defender_blacklist_test",
      "max_elements": 1000,
      "hash_size": 1024,
      "timeout": 3600
    }
  }
}

Capacity Exceeded Behavior

Stress test results (Day 52):

ipset_successes: 118          # First ~1000 IPs blocked successfully
ipset_failures: 16,681        # Attempts after capacity reached
max_queue_depth: 16,690       # Backpressure handled gracefully

What happens when capacity is exceeded:

✅ Graceful degradation - No crashes or memory leaks
✅ Error logging - Each failure logged with metrics
⚠️ Oldest entries evicted - FIFO (First In, First Out) with timeout
⚠️ No persistence - Evicted IPs are not stored anywhere

Mitigation Strategies

Priority 1 roadmap includes multi-tier storage to address capacity limitations.

Short-term mitigations:

Increase IPSet size (up to 500K entries):

{
  "ipsets": {
    "blacklist": {
      "max_elements": 100000,
      "hash_size": 16384
    }
  }
}

Aggressive timeout (evict IPs faster):

{
  "ipsets": {
    "blacklist": {
      "timeout": 1800  // 30 minutes instead of 1 hour
    }
  }
}

Whitelist critical IPs (prevent accidental blocks):

{
  "validation": {
    "allowed_ip_ranges": [
      "192.168.1.0/24",  // Internal network
      "10.0.0.0/8"       // Corporate VPN
    ],
    "block_localhost": false,
    "block_gateway": false
  }
}

Long-term solution (Priority 1.1):

Multi-tier storage architecture:
- Tier 1: IPSet (hot storage, <10 ms blocking)
- Tier 2: SQLite (warm storage, query historical blocks)
- Tier 3: Parquet (cold storage, forensic analysis)
Automatic eviction policy: LRU with recidivism tracking
Capacity monitoring: Prometheus alerts at 80% capacity

Performance Impact of Large IPSets

Benchmarks (estimated, not yet tested):

IPSet Size	Lookup Latency	Memory Usage	Performance
1,000 IPs	<1 μs	128 KB	✅ Optimal
10,000 IPs	<5 μs	1.2 MB	✅ Good
100,000 IPs	<50 μs	12 MB	⚠️ Acceptable
500,000 IPs	<250 μs	60 MB	⚠️ Maximum

IPSets larger than 100,000 entries may cause noticeable latency on low-end hardware. Test performance in your environment before production deployment.

Single-Node Deployment

No High Availability (Yet)

ML Defender currently operates as a single-node system: Limitations:

❌ No automatic failover if node crashes
❌ No load balancing across multiple instances
❌ No distributed state synchronization
❌ Single point of failure (SPOF)

Availability characteristics:

Uptime tested: 17+ hours continuous operation
Crash rate: 0 crashes in 2M+ packets processed
Memory stability: 4.5 MB footprint (zero growth over 17h)
Recovery time: Manual restart required (~5 seconds)

Workarounds for High Availability

1. Process monitoring with systemd:

# /etc/systemd/system/ml-defender-firewall.service
[Unit]
Description=ML Defender Firewall ACL Agent
After=network.target etcd-server.service

[Service]
Type=simple
User=root
WorkingDirectory=/opt/ml-defender/firewall-acl-agent/build
ExecStart=/opt/ml-defender/firewall-acl-agent/build/firewall-acl-agent \
  -c /opt/ml-defender/firewall-acl-agent/config/firewall.json
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

2. etcd-server auto-restart feature:

// etcd-server/config/etcd-server.json
{
  "auto_restart": {
    "enabled": true,
    "max_attempts": 3,
    "delay_seconds": 5,
    "commands": {
      "firewall-acl-agent": "cd /vagrant/firewall-acl-agent/build && sudo ./firewall-acl-agent -c ../config/firewall.json"
    }
  }
}

3. Dual-node active-passive (manual failover):

Deploy ML Defender on two nodes
Use virtual IP (VIP) with keepalived
Manual failover via VIP migration (~10 seconds)
Shared etcd cluster for configuration sync

Roadmap for High Availability (Priority 2)

Planned features:

Multi-node clustering with etcd-based leader election
Distributed IPSet synchronization (gossip protocol)
Health check endpoints for Kubernetes liveness/readiness probes
Horizontal scaling with consistent hashing
Stateless firewall agents (all state in etcd)

No Persistence Layer

Evicted IPs Are Lost

Current behavior:

IPSet entries have a timeout (default: 1 hour)
After timeout expires, IP is automatically removed from blacklist
No historical record of blocked IPs (except logs)
Restarting firewall-acl-agent clears all IPSet entries

Stress test example:

ipset_successes: 118          # Only 118 IPs persisted in IPSet
ipset_failures: 16,681        # 16,681 IPs were dropped (no storage)

Impact on Security Operations

Forensic analysis challenges:

❌ Cannot query “all IPs blocked in last 7 days”
❌ Cannot identify repeat offenders (recidivism analysis)
❌ Cannot correlate blocked IPs with incidents
⚠️ Log parsing required for historical queries

Operational challenges:

⚠️ No protection against “slow burn” attacks (IP rotates every hour)
⚠️ Cannot implement “permanent ban” for known malicious IPs
⚠️ Manual IPSet management required after restart

Workarounds

1. RAG ingester for log-based queries:

# Query blocked IPs from logs
cd rag/
python rag_query.py "Show me all IPs blocked in the last 24 hours"

# Output:
# 192.168.1.100 (blocked 47 times, last: 2026-03-01 14:32:15)
# 10.0.50.23 (blocked 12 times, last: 2026-03-01 14:30:45)
# ...

2. External database integration (custom):

# Example: Log blocked IPs to PostgreSQL
import psycopg2

def log_blocked_ip(ip, confidence, timestamp):
    conn = psycopg2.connect("dbname=ml_defender user=admin")
    cur = conn.cursor()
    cur.execute(
        "INSERT INTO blocked_ips (ip, confidence, timestamp) VALUES (%s, %s, %s)",
        (ip, confidence, timestamp)
    )
    conn.commit()

3. IPSet flush prevention:

// firewall-acl-agent/config/firewall.json
{
  "ipsets": {
    "blacklist": {
      "flush_on_startup": false  // Preserve entries on restart
    }
  }
}

Roadmap for Persistence (Priority 1.1)

Multi-tier storage architecture:

┌─────────────────────────────────────────────────────┐
│  Tier 1: IPSet (Hot Storage)                        │
│    - Capacity: 1K-10K IPs                           │
│    - Latency: &lt;10 ms                                │
│    - Use case: Real-time blocking                   │
└─────────────────────────────────────────────────────┘
              ↓ (eviction after 1 hour)
┌─────────────────────────────────────────────────────┐
│  Tier 2: SQLite (Warm Storage)                      │
│    - Capacity: 1M IPs                               │
│    - Latency: &lt;100 ms                               │
│    - Use case: Recidivism detection, queries        │
└─────────────────────────────────────────────────────┘
              ↓ (archival after 30 days)
┌─────────────────────────────────────────────────────┐
│  Tier 3: Parquet (Cold Storage)                     │
│    - Capacity: Unlimited (S3/MinIO)                 │
│    - Latency: &lt;1 second                             │
│    - Use case: Forensic analysis, compliance        │
└─────────────────────────────────────────────────────┘

Benefits:

✅ Unlimited historical storage
✅ Fast queries for recent blocks
✅ Automatic eviction policy
✅ Recidivism tracking (repeat offenders)
✅ Compliance and audit trails

Manual Capacity Management

No Automatic Monitoring

Current state:

❌ No built-in capacity alerts
❌ No automatic scaling of IPSet size
❌ No proactive eviction strategies
⚠️ Manual monitoring required via logs/metrics

Capacity indicators:

# Check current IPSet usage
sudo ipset list ml_defender_blacklist_test | wc -l
# Output: 987 (out of 1000 max)

# Monitor capacity in metrics file
cat /vagrant/logs/lab/firewall-metrics.json | jq '.ipset_capacity'
# Output: {"used": 987, "max": 1000, "utilization": 0.987}

Manual Capacity Management Tasks

1. Periodic capacity checks:

#!/bin/bash
# /usr/local/bin/check_ipset_capacity.sh

MAX_CAPACITY=1000
WARN_THRESHOLD=800  # 80%

USED=$(sudo ipset list ml_defender_blacklist_test | grep -c "^[0-9]")
UTILIZATION=$(echo "scale=2; $USED / $MAX_CAPACITY" | bc)

if [ $USED -gt $WARN_THRESHOLD ]; then
  echo "[WARN] IPSet capacity: $USED/$MAX_CAPACITY (${UTILIZATION}%)"
  # Send alert (email, Slack, PagerDuty, etc.)
fi

2. Manual IPSet flush (when needed):

# Flush all entries (CAUTION: removes all blocks)
sudo ipset flush ml_defender_blacklist_test

# Or remove specific IPs
sudo ipset del ml_defender_blacklist_test 192.168.1.100

3. Dynamic IPSet resizing (requires restart):

# Stop firewall-acl-agent
sudo systemctl stop ml-defender-firewall

# Destroy old IPSet
sudo ipset destroy ml_defender_blacklist_test

# Edit config (increase max_elements to 10000)
vim /opt/ml-defender/firewall-acl-agent/config/firewall.json

# Restart (will recreate IPSet with new size)
sudo systemctl start ml-defender-firewall

Roadmap for Automatic Management (Priority 1.3)

Planned features:

Prometheus metrics exporter:

ml_defender_ipset_capacity{set="blacklist"} 1000
ml_defender_ipset_used{set="blacklist"} 987
ml_defender_ipset_utilization{set="blacklist"} 0.987

Grafana dashboards with alerts:
- Warning at 80% capacity
- Critical at 90% capacity
- Auto-page on-call engineer
Auto-eviction policies:
- LRU (Least Recently Used)
- LFU (Least Frequently Used)
- Confidence-based (evict low-confidence blocks first)
- Recidivism-aware (keep repeat offenders longer)
Dynamic capacity scaling:
- Monitor utilization trends
- Auto-resize IPSet when consistently above 80%
- Graceful migration of entries

Other Known Limitations

Performance Constraints

CPU overhead with crypto enabled:

Baseline (no crypto): ~45% CPU @ 364 events/sec
With crypto: ~54% CPU @ 364 events/sec
Overhead: +9% (acceptable, but limits headroom)

Memory footprint:

Normal operation: 4.5 MB (ml-detector + firewall-acl-agent)
With crypto: 7.5 MB (includes libsodium + buffers)
IPSet storage: ~128 KB per 1000 IPs

Protocol Limitations

IPv6 support:

❌ Not currently implemented
⚠️ IPv4-only IPSet configuration
Roadmap: Priority 3 (IPv6 support)

VPN/Tunnel traffic:

⚠️ Limited visibility into encapsulated traffic (GRE, IPSec, WireGuard)
⚠️ Cannot analyze encrypted VPN payloads
Mitigation: Deploy ML Defender inside VPN termination point

Load balancer scenarios:

⚠️ X-Forwarded-For headers not inspected (only sees load balancer IP)
Mitigation: Deploy behind load balancer, inspect client IP from logs

Configuration Complexity

JSON-only configuration:

❌ No GUI for configuration management
⚠️ Manual editing required (error-prone)
Roadmap: Priority 2 (web-based admin interface)

Restart required for config changes:

⚠️ Most config changes require component restart
⚠️ No hot reload (except etcd-based config sync)
Roadmap: Priority 2 (runtime config updates)

Deployment Recommendations

Capacity Planning Guidelines

Small deployment (home/lab):

IPSet size: 1,000 IPs
Expected load: <50 events/sec
Hardware: 2 CPU cores, 4 GB RAM

Medium deployment (small business):

IPSet size: 10,000 IPs
Expected load: <200 events/sec
Hardware: 4 CPU cores, 8 GB RAM

Large deployment (enterprise):

IPSet size: 100,000 IPs
Expected load: <500 events/sec
Hardware: 8 CPU cores, 16 GB RAM
Multi-node with load balancing (when available)

Monitoring Requirements

Minimum monitoring (production):

IPSet capacity utilization
Crypto error rate
Component health (etcd, firewall-acl-agent, ml-detector)
Disk space for logs
Network latency (ZMQ connections)

Advanced monitoring (enterprise):

Prometheus + Grafana dashboards
SIEM integration (Splunk, ELK)
Incident response automation (SOAR)
Forensic analysis (RAG ingester)

Roadmap Summary

Priority 1: Production Scale (2 weeks)

P1.1 - Multi-tier storage (IPSet → SQLite → Parquet)
P1.2 - Async queue + worker pool (1K+ events/sec)
P1.3 - Capacity monitoring + auto-eviction

Priority 2: Observability (1 week)

P2.1 - Prometheus metrics exporter
P2.2 - Grafana dashboards
P2.3 - Health check endpoints (Kubernetes)
P2.4 - Runtime config via etcd

Priority 3: High Availability (2 weeks)

P3.1 - Multi-node clustering (etcd-based)
P3.2 - Distributed IPSet synchronization
P3.3 - Load balancing and failover
P3.4 - IPv6 support

Conclusion

ML Defender is production-ready within its design constraints. Understanding these limitations allows you to:

Plan capacity appropriately for your deployment
Deploy monitoring to track utilization
Architect HA solutions for critical environments
Integrate complementary security tools (TLS inspection, EDR, SIEM)

The roadmap addresses all major limitations in a methodical, prioritized manner.

Never deploy ML Defender as a standalone security solution. Use it as part of a defense-in-depth strategy alongside firewalls, IDS/IPS, endpoint protection, and SIEM.

Cryptographic Pipeline

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Overview

Getting Started

Components

Operations

Security

Overview

IPSet Capacity Limits

Maximum Realistic Capacity

Capacity Exceeded Behavior

Mitigation Strategies

Performance Impact of Large IPSets

Single-Node Deployment

No High Availability (Yet)

Workarounds for High Availability

Roadmap for High Availability (Priority 2)

No Persistence Layer

Evicted IPs Are Lost

Impact on Security Operations

Workarounds

Roadmap for Persistence (Priority 1.1)

Manual Capacity Management

No Automatic Monitoring

Manual Capacity Management Tasks

Roadmap for Automatic Management (Priority 1.3)

Other Known Limitations

Performance Constraints

Protocol Limitations

Configuration Complexity

Deployment Recommendations

Capacity Planning Guidelines

Monitoring Requirements

Roadmap Summary

Priority 1: Production Scale (2 weeks)

Priority 2: Observability (1 week)

Priority 3: High Availability (2 weeks)

Conclusion

Build docs developers (and LLMs) love

Overview

Getting Started

Components

Operations

Security

​Overview

​IPSet Capacity Limits

​Maximum Realistic Capacity

​Capacity Exceeded Behavior

​Mitigation Strategies

​Performance Impact of Large IPSets

​Single-Node Deployment

​No High Availability (Yet)

​Workarounds for High Availability

​Roadmap for High Availability (Priority 2)

​No Persistence Layer

​Evicted IPs Are Lost

​Impact on Security Operations

​Workarounds

​Roadmap for Persistence (Priority 1.1)

​Manual Capacity Management

​No Automatic Monitoring

​Manual Capacity Management Tasks

​Roadmap for Automatic Management (Priority 1.3)

​Other Known Limitations

​Performance Constraints

​Protocol Limitations

​Configuration Complexity

​Deployment Recommendations

​Capacity Planning Guidelines

​Monitoring Requirements

​Roadmap Summary

​Priority 1: Production Scale (2 weeks)

​Priority 2: Observability (1 week)

​Priority 3: High Availability (2 weeks)

​Conclusion

Build docs developers (and LLMs) love

Overview

IPSet Capacity Limits

Maximum Realistic Capacity

Capacity Exceeded Behavior

Mitigation Strategies

Performance Impact of Large IPSets

Single-Node Deployment

No High Availability (Yet)

Workarounds for High Availability

Roadmap for High Availability (Priority 2)

No Persistence Layer

Evicted IPs Are Lost

Impact on Security Operations

Workarounds

Roadmap for Persistence (Priority 1.1)

Manual Capacity Management

No Automatic Monitoring

Manual Capacity Management Tasks

Roadmap for Automatic Management (Priority 1.3)

Other Known Limitations

Performance Constraints

Protocol Limitations

Configuration Complexity

Deployment Recommendations

Capacity Planning Guidelines

Monitoring Requirements

Roadmap Summary

Priority 1: Production Scale (2 weeks)

Priority 2: Observability (1 week)

Priority 3: High Availability (2 weeks)

Conclusion