Overview
Thereduce transform collapses multiple log events into a single event based on grouping and merge strategies. It’s ideal for combining related log lines into transactions, assembling multi-line logs, or aggregating events over time windows.
This transform is essential for:
- Combining multi-line stack traces
- Aggregating related log events into transactions
- Grouping events by request ID, session ID, or correlation ID
- Reducing log volume by combining repetitive events
- Building event sequences before downstream processing
- Flexible field-based grouping
- Multiple merge strategies per field
- Time-based expiration
- Conditional flush on start/end markers
- Maximum event limits per group
- Task transform for stateful aggregation
Configuration
The maximum period of time to wait after the last event is received, in milliseconds, before a combined event should be considered complete.After this period elapses, the reduced event is flushed downstream.
The interval to check for and flush any expired events, in milliseconds.This determines how frequently the transform checks for groups that have exceeded
expire_after_ms.Optional. If supplied, every time this interval elapses for a given grouping, the reduced value for that grouping is flushed.Checked every
flush_period_ms. Useful for periodic aggregation.The maximum number of events to group together.When this limit is reached, the reduced event is immediately flushed.
An ordered list of fields by which to group events.Each group with matching values for the specified keys is reduced independently. When not specified, all events are reduced in a single group.
A map of field names to custom merge strategies.For each field specified, the given strategy is used for combining events rather than the default behavior.Default behaviors:
- Strings: First value is kept
- Timestamps: First value is kept,
[field]_endis added with last value - Numbers: Values are summed
discard- Discard field value (remove from output)retain- Keep only the first valuesum- Sum numeric valuesmax- Keep maximum numeric valuemin- Keep minimum numeric valuearray- Collect all values into an arrayconcat- Concatenate string values with space separatorconcat_newline- Concatenate strings with newline separatorconcat_raw- Concatenate strings with no separatorshortest_array- Keep the shortest arraylongest_array- Keep the longest arrayflat_unique- Flatten arrays/objects and keep unique values
A condition used to distinguish the first event of a transaction.If this condition resolves to
true for an event, the previous transaction is flushed (without this event) and a new transaction is started.A condition used to distinguish the final event of a transaction.If this condition resolves to
true for an event, the current transaction is immediately flushed with this event.Inputs
List of upstream component IDs.The reduce transform only accepts log events. Metrics and traces are ignored.
Outputs
The reduce transform has a single output that emits reduced log events when:expire_after_mstime has passed since last event in groupmax_eventslimit is reached for a groupends_whencondition matchesstarts_whencondition matches (flushes previous group)end_every_period_msinterval elapses- Vector shutdown (flushes all buffered groups)
Examples
Combine Multi-line Stack Traces
Aggregate by Request ID
Session Aggregation
Periodic Time-based Aggregation
Combine Duplicate Events
Transaction Processing
Kubernetes Pod Logs
Database Query Logs
Merge Strategies Explained
discard
Removes the field from the output event entirely.retain
Keeps only the first value, discards subsequent values.sum
Sums numeric values. Non-numeric values are ignored.max / min
Keeps the maximum or minimum numeric value.array
Collects all values into an array.concat / concat_newline / concat_raw
Concatenates string values with different separators:concat: Space separatorconcat_newline: Newline separatorconcat_raw: No separator
shortest_array / longest_array
Keeps the shortest or longest array value.flat_unique
Flattens arrays and objects, collecting unique scalar values.Default Merge Behavior
When no merge strategy is specified:| Field Type | Default Behavior |
|---|---|
| String | Keep first value (retain) |
| Number | Sum all values |
| Timestamp | Keep first, add <field>_end with last |
| Array | No default (must specify strategy) |
| Object | No default (must specify strategy) |
Grouping Behavior
Events are grouped by exact matches on allgroup_by fields:
group_by is empty, all events are reduced into a single group.
Flush Triggers
Reduced events are flushed when any of these conditions are met:1. Time Expiration
2. Event Count Limit
3. End Condition
4. Start Condition
5. Periodic Flush
6. Shutdown
All buffered groups are flushed when Vector shuts down.Performance Considerations
Memory Usage
Memory usage scales with:- Number of active groups (cardinality of
group_byfields) - Number of events per group
- Size of events
- Merge strategies used (arrays use more memory)
Cardinality Management
High cardinality ingroup_by fields increases memory usage:
- Shorter
expire_after_ms - Lower
max_events - More specific
starts_when/ends_whenconditions
Flush Period Tuning
flush_period_ms controls how often expiration checks run:
- Lower values: More responsive, higher CPU usage
- Higher values: Less responsive, lower CPU usage
Common Patterns
Multi-line Exceptions (Java/Python)
HTTP Request Lifecycle
Batch Processing
Troubleshooting
Events Not Being Combined
- Check
group_byfields exist and have matching values - Verify flush conditions are not too aggressive
- Check if
starts_when/ends_whenare triggering prematurely - Increase
expire_after_msormax_events
Memory Issues
- Reduce
expire_after_msto flush more frequently - Lower
max_eventslimit - Reduce
group_bycardinality - Add more specific
starts_when/ends_whenconditions - Use
discardstrategy for unnecessary fields
Missing Data
- Check if events are expiring too quickly
- Verify
starts_whenisn’t flushing prematurely - Ensure
group_byfields are consistent across related events
Metrics
The reduce transform doesn’t emit specific internal metrics, but standard component metrics are available:component_received_events_total- Events receivedcomponent_sent_events_total- Reduced events emitted
1 - (sent / received)
See Also
- Aggregate Transform - Aggregate metrics
- Remap Transform - VRL-based transformation
- Filter Transform - Conditional filtering
- VRL Reference - VRL conditions for starts_when/ends_when