Agent Role
The agent role is designed to run on every node in your infrastructure to collect local data. Agents are lightweight and run close to the data source.Characteristics
- Deployed per-node: One instance per host/node
- Local data collection: Reads logs, metrics, and traces from the local system
- Lightweight: Minimal resource footprint
- Edge processing: Can perform light transformations before forwarding
- Resilient: Operates independently, doesn’t depend on centralized services
Common Use Cases
- Collecting Kubernetes pod logs via
kubernetes_logssource - Gathering host metrics from local system
- Tailing log files from local filesystem
- Collecting application metrics via StatsD or Prometheus
- Forwarding data to aggregators or directly to sinks
Example Agent Configuration
Aggregator Role
The aggregator role receives data from multiple agents, performs centralized processing, and routes data to final destinations.Characteristics
- Centralized deployment: One or more instances serving multiple agents
- Data aggregation: Receives data from multiple sources
- Heavy processing: Performs complex transformations, enrichment, and routing
- Buffering: Provides buffering and backpressure handling
- Scalable: Can be horizontally scaled based on throughput needs
Common Use Cases
- Receiving data from multiple Vector agents
- Centralized parsing and transformation
- Data enrichment (GeoIP, external lookups)
- Routing to multiple destinations
- Aggregating metrics across the infrastructure
- Implementing complex filtering and sampling logic
Example Aggregator Configuration
Hybrid Deployments
Many organizations use both roles together:Benefits of Hybrid Architecture
- Reduced load: Agents handle local collection; aggregators handle heavy processing
- Network efficiency: Local filtering reduces data transfer
- Reliability: Agents can buffer locally if aggregators are unavailable
- Flexibility: Centralized configuration changes without touching edge nodes
- Security: Single egress point for external destinations
Stateless Aggregator
For stateless workloads that don’t require persistent storage, you can deploy aggregators without persistent volumes:- Simple transformation and forwarding
- Kubernetes Deployments that can scale quickly
- Cost-sensitive environments
- High-availability setups with load balancing
Choosing the Right Role
| Factor | Agent | Aggregator |
|---|---|---|
| Deployment | Per-node (DaemonSet, systemd per host) | Centralized (Deployment, StatefulSet) |
| Resource Usage | Low (100-500MB RAM) | High (1-8GB+ RAM) |
| Data Volume | Local node only | Aggregate from many sources |
| Processing | Light transforms, filtering | Heavy transforms, enrichment, routing |
| State | Minimal local state | May require persistent storage |
| Network | Outbound connections | Inbound + outbound |
| Scaling | Automatic (per-node) | Manual/HPA based on load |
Best Practices
For Agents
- Keep configuration simple and focused on collection
- Use local buffering to handle temporary network issues
- Implement resource limits to prevent resource exhaustion
- Use the
kubernetes_logssource for Kubernetes environments - Enable internal metrics for observability
For Aggregators
- Size appropriately for expected throughput
- Use persistent storage for stateful operations
- Implement health checks and readiness probes
- Configure appropriate buffer sizes
- Use connection pooling for sinks
- Monitor queue sizes and backpressure
- Deploy multiple replicas for high availability
Security Considerations
- Use TLS for agent-to-aggregator communication
- Implement authentication between agents and aggregators
- Restrict network access using firewalls or network policies
- Run with minimal privileges (see systemd hardened configuration)
- Regularly update Vector to get security patches