Infrastructure Monitoring
Infrastructure monitoring provides visibility into the health, performance, and availability of all systems within the SOC environment. This layer uses industry-leading tools to ensure operational reliability and detect performance-based security anomalies.Zabbix and Prometheus work together to provide comprehensive monitoring: Zabbix for traditional infrastructure monitoring and Prometheus for cloud-native metrics and alerting.
Architecture Overview
Zabbix
Enterprise infrastructure monitoring for availability and performance
Prometheus
Time-series metrics collection and alerting for modern infrastructure
Zabbix Monitoring Platform
Core Capabilities
Zabbix provides comprehensive monitoring for enterprise infrastructure:- Monitoring Methods
- Supported Platforms
- Key Features
Data Collection Techniques:
- Agent-based: Zabbix agents on monitored hosts
- Agentless: SNMP, IPMI, JMX monitoring
- Active vs Passive: Agent or server-initiated checks
- Web monitoring: HTTP/HTTPS endpoint checks
- Database monitoring: Native database queries
- Log file monitoring: Pattern matching in logs
Monitoring Templates
Operating System Templates
Operating System Templates
Linux Monitoring:
- CPU utilization and load average
- Memory usage (used, free, cached)
- Disk space and I/O metrics
- Network interface statistics
- Process monitoring
- System logs
- Performance counters
- Windows services
- Event log monitoring
- Active Directory health
- IIS web server metrics
Network Device Templates
Network Device Templates
SNMP Monitoring:
- Interface status and bandwidth
- CPU and memory on network devices
- Routing table monitoring
- Temperature sensors
- Power supply status
- Fan speed monitoring
Application Templates
Application Templates
Common Applications:
- Apache/Nginx web servers
- MySQL/PostgreSQL databases
- Elasticsearch clusters
- Docker containers
- Kubernetes clusters
- Redis, MongoDB, RabbitMQ
Alert Configuration
Zabbix Agent Configuration
- Linux Agent
- Windows Agent
- Custom Metrics
Prometheus Metrics System
Architecture and Concepts
Prometheus follows a pull-based model for metrics collection:Time-Series Database
Efficient storage of metrics with labels for multi-dimensional data
PromQL Query Language
Powerful query language for data aggregation and analysis
Service Discovery
Automatic target discovery for dynamic environments
Alertmanager
Flexible alert routing and notification management
Exporters
Exporters expose metrics in Prometheus format:- Official Exporters
- Installation Example
- Custom Exporters
System and Infrastructure:
- Node Exporter: Linux/Unix system metrics
- Windows Exporter: Windows system metrics
- Blackbox Exporter: Endpoint probing (HTTP, DNS, TCP)
- SNMP Exporter: Network device metrics
- MySQL Exporter: Database metrics
- PostgreSQL Exporter: Database performance
- Redis Exporter: Redis statistics
- Elasticsearch Exporter: Cluster health
Prometheus Configuration
prometheus.yml Configuration
prometheus.yml Configuration
PromQL Queries
Alert Rules
Alert Rule Examples
Alert Rule Examples
Visualization and Dashboards
Zabbix Dashboards
Network Maps
Visual topology with real-time status indicators
Custom Widgets
Graphs, gauges, and tables for key metrics
Screens
Multi-graph displays for comprehensive views
Reports
Scheduled PDF/CSV reports for stakeholders
Grafana Integration
Grafana provides unified visualization for both Zabbix and Prometheus:Integration with SOC Architecture
Security-Relevant Metrics
Infrastructure monitoring contributes to security operations:- Performance Anomalies
- Availability Monitoring
- Resource Monitoring
Security Indicators:
- Sudden CPU/memory spikes (cryptomining)
- Unusual network traffic patterns
- Unexpected process creation
- Abnormal disk I/O (data exfiltration)
Forwarding to Wazuh
Best Practices
Monitoring Strategy
Monitoring Strategy
- Monitor what matters: Focus on business-critical services
- Set meaningful thresholds: Avoid alert fatigue
- Use dependencies: Prevent alert storms
- Document runbooks: Link alerts to resolution procedures
Performance
Performance
- Optimize check intervals: Balance freshness vs overhead
- Use passive checks: For high-volume environments
- Database partitioning: Implement in Zabbix for large deployments
- Retention policies: Keep only necessary historical data
High Availability
High Availability
- Cluster Zabbix servers: For redundancy
- Prometheus federation: Hierarchical monitoring
- Backup configurations: Version control Grafana dashboards
- Monitor the monitors: Ensure monitoring systems are healthy
Security
Security
- Encrypt communications: Use TLS for all traffic
- Restrict agent commands: Disable remote commands unless required
- Authentication: Strong passwords and API tokens
- Network segmentation: Isolate monitoring network
Official Documentation
Zabbix Documentation
Complete Zabbix installation and configuration guide
Prometheus Documentation
Official Prometheus documentation and best practices
Grafana Documentation
Grafana setup, datasources, and dashboard creation
PromQL Guide
PromQL query language reference
Next Steps
- Configure SIEM Platform to receive monitoring alerts
- Set up Incident Response workflows for infrastructure issues
- Review Operations Guide for day-to-day monitoring procedures
