Skip to main content
Understanding the data flow is critical to comprehending how the SOC Architecture detects, processes, analyzes, and responds to security events. This page details the complete data pipeline from endpoints to automated response.

Overview

The SOC Architecture implements a multi-layered data flow that ensures comprehensive visibility, efficient processing, and rapid response to security events.
All data flows are designed to be bidirectional where necessary, allowing for feedback loops and automated responses.

Primary Data Flow Paths

1. Network Traffic Detection Flow

Network Traffic Detection Pipeline

Flow: Endpoints → IDS (Snort/Suricata) → Logstash → Elasticsearch → Wazuh
  1. Endpoints generate network traffic
  2. IDS Systems (Snort/Suricata) monitor all traffic for suspicious patterns
  3. Logstash collects and processes IDS alerts and logs
  4. Elasticsearch stores processed events for analysis
  5. Wazuh correlates events and displays security insights

2. Infrastructure Monitoring Flow

Infrastructure & Performance Monitoring Pipeline

Flow: Endpoints → Firewall → Zabbix → Prometheus → Wazuh
  1. Endpoints send traffic through the firewall
  2. Firewall provides network metrics and connection data
  3. Zabbix monitors infrastructure availability and performance
  4. Prometheus collects real-time metrics and generates alerts
  5. Wazuh aggregates monitoring data with security events

3. Incident Response Flow

Incident Management & Response Pipeline

Flow: Wazuh → TheHive → Cortex → Automated Response
  1. Wazuh detects security event and triggers alert
  2. TheHive creates incident case for investigation
  3. Cortex analyzes the incident and determines response
  4. Automated Response executes predefined playbooks
  5. Actions applied to affected endpoints or infrastructure

4. Future: Honeypot & VPN Flow

This data flow is planned for long-term implementation and represents future capabilities.

Deception & Secure Access Pipeline (Long-term)

Flow: Honeypots/VPN → Logstash/Wazuh → Analysis
  1. Honeypots attract and log attacker activities
  2. Tailscale VPN provides secure remote access with logging
  3. Logstash processes honeypot and VPN logs
  4. Wazuh correlates deception data with other security events
  5. Threat Intelligence is enriched with real attack patterns

Event Processing Pipeline

Data Sources:
  • Network traffic (via IDS/IPS)
  • Endpoint logs (via Wazuh agents)
  • Firewall logs
  • Infrastructure metrics (Zabbix, Prometheus)
  • Application logs
Technologies: Snort, Suricata, Wazuh Agents, Zabbix, Prometheus
Processing:
  • Log collection from multiple sources
  • Data format normalization
  • Field extraction and enrichment
  • Initial filtering and routing
Technologies: Logstash, Fluentd
Storage:
  • Indexed storage for fast search
  • Long-term retention
  • Data compression
  • Backup and archival
Technologies: Elasticsearch
Analysis:
  • Event correlation across sources
  • Pattern matching and anomaly detection
  • Threat intelligence integration
  • Risk scoring
Technologies: Wazuh (SIEM/XDR)
Presentation:
  • Security dashboards
  • Real-time alerts
  • Compliance reporting
  • Custom visualizations
Technologies: Wazuh Dashboard, Prometheus, Zabbix
Response:
  • Incident case creation
  • Automated playbook execution
  • Endpoint isolation or remediation
  • Notification workflows
Technologies: TheHive, Cortex

Data Flow Characteristics

Latency Expectations

Flow PathExpected LatencyPriority
IDS → Wazuh< 5 secondsHigh
Metrics → Prometheus< 10 secondsMedium
Alert → TheHive< 30 secondsHigh
Cortex Response< 2 minutesCritical

Data Volume Planning

Elasticsearch sizing should account for:
  • Log ingestion rate: 10,000-50,000 events/second (estimated)
  • Retention period: 90 days (hot), 1 year (warm)
  • Replication factor: 2x for high availability

Integration Points

For detailed information about how specific components integrate with each other, see the Integrations page.

Complete Flow Diagram

Dotted lines represent long-term planned data flows that are not part of the initial core architecture.

Build docs developers (and LLMs) love