Data Flow Architecture

Understanding the data flow is critical to comprehending how the SOC Architecture detects, processes, analyzes, and responds to security events. This page details the complete data pipeline from endpoints to automated response.

Overview

The SOC Architecture implements a multi-layered data flow that ensures comprehensive visibility, efficient processing, and rapid response to security events.

All data flows are designed to be bidirectional where necessary, allowing for feedback loops and automated responses.

Primary Data Flow Paths

1. Network Traffic Detection Flow

Network Traffic Detection Pipeline

Flow: Endpoints → IDS (Snort/Suricata) → Logstash → Elasticsearch → Wazuh

Endpoints generate network traffic
IDS Systems (Snort/Suricata) monitor all traffic for suspicious patterns
Logstash collects and processes IDS alerts and logs
Elasticsearch stores processed events for analysis
Wazuh correlates events and displays security insights

2. Infrastructure Monitoring Flow

Infrastructure & Performance Monitoring Pipeline

Flow: Endpoints → Firewall → Zabbix → Prometheus → Wazuh

Endpoints send traffic through the firewall
Firewall provides network metrics and connection data
Zabbix monitors infrastructure availability and performance
Prometheus collects real-time metrics and generates alerts
Wazuh aggregates monitoring data with security events

3. Incident Response Flow

Incident Management & Response Pipeline

Flow: Wazuh → TheHive → Cortex → Automated Response

Wazuh detects security event and triggers alert
TheHive creates incident case for investigation
Cortex analyzes the incident and determines response
Automated Response executes predefined playbooks
Actions applied to affected endpoints or infrastructure

4. Future: Honeypot & VPN Flow

This data flow is planned for long-term implementation and represents future capabilities.

Deception & Secure Access Pipeline (Long-term)

Flow: Honeypots/VPN → Logstash/Wazuh → Analysis

Honeypots attract and log attacker activities
Tailscale VPN provides secure remote access with logging
Logstash processes honeypot and VPN logs
Wazuh correlates deception data with other security events
Threat Intelligence is enriched with real attack patterns

Event Processing Pipeline

Stage 1: Collection

Data Sources:

Network traffic (via IDS/IPS)
Endpoint logs (via Wazuh agents)
Firewall logs
Infrastructure metrics (Zabbix, Prometheus)
Application logs

Technologies: Snort, Suricata, Wazuh Agents, Zabbix, Prometheus

Stage 2: Aggregation & Normalization

Processing:

Log collection from multiple sources
Data format normalization
Field extraction and enrichment
Initial filtering and routing

Technologies: Logstash, Fluentd

Stage 3: Storage & Indexing

Storage:

Indexed storage for fast search
Long-term retention
Data compression
Backup and archival

Technologies: Elasticsearch

Stage 4: Analysis & Correlation

Analysis:

Event correlation across sources
Pattern matching and anomaly detection
Threat intelligence integration
Risk scoring

Technologies: Wazuh (SIEM/XDR)

Stage 5: Alerting & Visualization

Presentation:

Security dashboards
Real-time alerts
Compliance reporting
Custom visualizations

Technologies: Wazuh Dashboard, Prometheus, Zabbix

Stage 6: Response & Remediation

Response:

Incident case creation
Automated playbook execution
Endpoint isolation or remediation
Notification workflows

Technologies: TheHive, Cortex

Data Flow Characteristics

Latency Expectations

Flow Path	Expected Latency	Priority
IDS → Wazuh	< 5 seconds	High
Metrics → Prometheus	< 10 seconds	Medium
Alert → TheHive	< 30 seconds	High
Cortex Response	< 2 minutes	Critical

Data Volume Planning

Elasticsearch sizing should account for:

Log ingestion rate: 10,000-50,000 events/second (estimated)
Retention period: 90 days (hot), 1 year (warm)
Replication factor: 2x for high availability

Integration Points

For detailed information about how specific components integrate with each other, see the Integrations page.

Complete Flow Diagram

Dotted lines represent long-term planned data flows that are not part of the initial core architecture.

Resources

Overview

Primary Data Flow Paths

1. Network Traffic Detection Flow

Network Traffic Detection Pipeline

2. Infrastructure Monitoring Flow

Infrastructure & Performance Monitoring Pipeline

3. Incident Response Flow

Incident Management & Response Pipeline

4. Future: Honeypot & VPN Flow

Deception & Secure Access Pipeline (Long-term)

Event Processing Pipeline

Data Flow Characteristics

Latency Expectations

Data Volume Planning

Integration Points

Complete Flow Diagram

Build docs developers (and LLMs) love

Resources

​Overview

​Primary Data Flow Paths

​1. Network Traffic Detection Flow

Network Traffic Detection Pipeline

​2. Infrastructure Monitoring Flow

Infrastructure & Performance Monitoring Pipeline

​3. Incident Response Flow

Incident Management & Response Pipeline

​4. Future: Honeypot & VPN Flow

Deception & Secure Access Pipeline (Long-term)

​Event Processing Pipeline

​Data Flow Characteristics

​Latency Expectations

​Data Volume Planning

​Integration Points

​Complete Flow Diagram

Build docs developers (and LLMs) love

Overview

Primary Data Flow Paths

1. Network Traffic Detection Flow

2. Infrastructure Monitoring Flow

3. Incident Response Flow

4. Future: Honeypot & VPN Flow

Event Processing Pipeline

Data Flow Characteristics

Latency Expectations

Data Volume Planning

Integration Points

Complete Flow Diagram