Overview
The SOC Architecture implements a multi-layered data flow that ensures comprehensive visibility, efficient processing, and rapid response to security events.All data flows are designed to be bidirectional where necessary, allowing for feedback loops and automated responses.
Primary Data Flow Paths
1. Network Traffic Detection Flow
Network Traffic Detection Pipeline
Flow: Endpoints → IDS (Snort/Suricata) → Logstash → Elasticsearch → Wazuh
- Endpoints generate network traffic
- IDS Systems (Snort/Suricata) monitor all traffic for suspicious patterns
- Logstash collects and processes IDS alerts and logs
- Elasticsearch stores processed events for analysis
- Wazuh correlates events and displays security insights
2. Infrastructure Monitoring Flow
Infrastructure & Performance Monitoring Pipeline
Flow: Endpoints → Firewall → Zabbix → Prometheus → Wazuh
- Endpoints send traffic through the firewall
- Firewall provides network metrics and connection data
- Zabbix monitors infrastructure availability and performance
- Prometheus collects real-time metrics and generates alerts
- Wazuh aggregates monitoring data with security events
3. Incident Response Flow
Incident Management & Response Pipeline
Flow: Wazuh → TheHive → Cortex → Automated Response
- Wazuh detects security event and triggers alert
- TheHive creates incident case for investigation
- Cortex analyzes the incident and determines response
- Automated Response executes predefined playbooks
- Actions applied to affected endpoints or infrastructure
4. Future: Honeypot & VPN Flow
This data flow is planned for long-term implementation and represents future capabilities.
Deception & Secure Access Pipeline (Long-term)
Flow: Honeypots/VPN → Logstash/Wazuh → Analysis
- Honeypots attract and log attacker activities
- Tailscale VPN provides secure remote access with logging
- Logstash processes honeypot and VPN logs
- Wazuh correlates deception data with other security events
- Threat Intelligence is enriched with real attack patterns
Event Processing Pipeline
Stage 1: Collection
Stage 1: Collection
Data Sources:
- Network traffic (via IDS/IPS)
- Endpoint logs (via Wazuh agents)
- Firewall logs
- Infrastructure metrics (Zabbix, Prometheus)
- Application logs
Stage 2: Aggregation & Normalization
Stage 2: Aggregation & Normalization
Processing:
- Log collection from multiple sources
- Data format normalization
- Field extraction and enrichment
- Initial filtering and routing
Stage 3: Storage & Indexing
Stage 3: Storage & Indexing
Storage:
- Indexed storage for fast search
- Long-term retention
- Data compression
- Backup and archival
Stage 4: Analysis & Correlation
Stage 4: Analysis & Correlation
Analysis:
- Event correlation across sources
- Pattern matching and anomaly detection
- Threat intelligence integration
- Risk scoring
Stage 5: Alerting & Visualization
Stage 5: Alerting & Visualization
Presentation:
- Security dashboards
- Real-time alerts
- Compliance reporting
- Custom visualizations
Stage 6: Response & Remediation
Stage 6: Response & Remediation
Response:
- Incident case creation
- Automated playbook execution
- Endpoint isolation or remediation
- Notification workflows
Data Flow Characteristics
Latency Expectations
| Flow Path | Expected Latency | Priority |
|---|---|---|
| IDS → Wazuh | < 5 seconds | High |
| Metrics → Prometheus | < 10 seconds | Medium |
| Alert → TheHive | < 30 seconds | High |
| Cortex Response | < 2 minutes | Critical |
Data Volume Planning
Elasticsearch sizing should account for:
- Log ingestion rate: 10,000-50,000 events/second (estimated)
- Retention period: 90 days (hot), 1 year (warm)
- Replication factor: 2x for high availability
Integration Points
For detailed information about how specific components integrate with each other, see the Integrations page.Complete Flow Diagram
Dotted lines represent long-term planned data flows that are not part of the initial core architecture.
