Pipeline Architecture
Core Concepts
A Vector pipeline consists of three component types connected in a graph:Sources
Ingest data from external systems (files, syslog, APIs, etc.)
Transforms
Process, parse, filter, and route events through the pipeline
Sinks
Send data to external systems (databases, object storage, SaaS platforms)
How Pipelines Work
Component Tasks
Vector runs each component as an asynchronous Tokio task:- Sources generate events and send them to downstream components
- Transforms receive events, process them, and forward results
- Sinks receive events and deliver them to external systems
Event Flow
When Vector starts, it:- Parses and validates the configuration file
- Builds each component (source, transform, sink)
- Creates channels between components based on
inputsdeclarations - Spawns each component as an independent async task
- Wires components together via message-passing channels
- Begins processing events
Example Pipeline
Here’s a simple pipeline that reads Apache logs, parses them, and routes to multiple destinations:Topology Graph
Directed Acyclic Graph (DAG)
Vector enforces a DAG structure, which means:- Events flow in one direction (no cycles)
- Components can have multiple inputs and outputs
- The same event can be sent to multiple destinations
- No component can receive data from its own output (directly or indirectly)
Vector validates your configuration at startup to ensure no cycles exist. Circular dependencies will cause Vector to fail to start with a clear error message.
Component Inputs
Every transform and sink specifies itsinputs as a list of upstream components:
Component Outputs
Most components have a single, unnamed output. Some components support multiple named outputs: Route transform - Sends events to different named outputs:Fanout (Multiple Destinations)
Vector uses “fanout” to send the same event to multiple downstream components:Backpressure and Flow Control
Vector implements intelligent backpressure to handle downstream slowness:How Backpressure Works
- When a sink is slow or a buffer fills up, it stops accepting new events
- This backpressure propagates upstream through transforms
- Eventually reaches sources, which slow down or stop reading new data
- When the slow component catches up, flow resumes automatically
Backpressure Strategies
Vector components can be configured with different backpressure behaviors:- Block (Default)
- Drop Newest
Wait for space to become available. This ensures no data loss but may cause slowdowns.
Concurrency and Parallelism
Transform Concurrency
Some transforms support concurrent processing:- Spawns multiple tasks for CPU-intensive transforms
- Maintains event ordering when required
- Distributes work across available CPU cores
Sink Concurrency
Sinks can process batches in parallel:Dynamic Topology Changes
Vector supports live configuration reloads without dropping events:Reload Process
- Vector receives a reload signal (SIGHUP or API call)
- Parses and validates the new configuration
- Computes a diff between old and new topologies
- Gracefully shuts down removed components
- Starts new components
- Reconfigures existing components that changed
- Rewires connections between components
What Can Be Changed
Safe changes (no disruption)
Safe changes (no disruption)
- Adding new sources, transforms, or sinks
- Removing components
- Changing transform logic
- Modifying sink destinations
- Adjusting buffer sizes
Changes requiring restart
Changes requiring restart
- Changing global data directory
- Modifying API settings (address, TLS)
- Some source types that maintain persistent connections
Persistent components
Persistent components
- Disk buffers survive reloads and restarts
- Checkpoint data (file positions, offsets) is preserved
- In-flight events are drained before component shutdown
Topology Validation
Vector performs extensive validation on your configuration:Startup Validation
- Syntax validation: YAML/TOML parsing and structure
- Schema validation: Component types and required fields
- Reference validation: All inputs must reference existing components
- Cycle detection: No circular dependencies in the graph
- Type compatibility: Metrics can’t flow into log-only sinks
- Output validation: Named outputs must exist on referenced components
Example Validation Errors
Advanced Patterns
Sampling and Load Shedding
Reduce data volume with sampling:Aggregation and Reduction
Combine multiple events into aggregates:Conditional Routing
Route events based on content:Observability
Vector emits internal metrics about pipeline health:Built-in Metrics
component_received_events_total- Events received by each componentcomponent_sent_events_total- Events sent by each componentcomponent_errors_total- Errors encounteredbuffer_received_events_total- Events entering buffersbuffer_sent_events_total- Events leaving buffers
Monitoring Your Pipeline
http://localhost:9598/metrics.
Best Practices
Design for failure
Design for failure
- Configure appropriate buffers for all sinks
- Use disk buffers for critical data paths
- Monitor buffer utilization and backpressure
- Set up health checks for downstream systems
Optimize topology
Optimize topology
- Parse and transform data as early as possible
- Place expensive operations after filtering
- Use route transforms to split traffic before expensive sinks
- Consider sampling high-volume, low-value data
Test configuration changes
Test configuration changes
- Use
vector validateto check config before deployment - Test transforms with
vector vrlREPL - Monitor metrics after configuration reloads
- Keep previous configurations for rollback
Maintain clear data flow
Maintain clear data flow
- Use descriptive component names
- Document complex routing logic
- Avoid deeply nested transforms when possible
- Group related components with naming conventions
Related Topics
- Data Model - Understanding event types
- Sources - Ingesting data
- Transforms - Processing events
- Sinks - Sending data to destinations
- Buffering - Managing backpressure and reliability