Skip to main content
Transforms are Vector components that process events as they flow through your pipeline. They can parse, filter, enrich, aggregate, and route data between sources and sinks. Transforms are the core of Vector’s data processing capabilities.

How Transforms Work

Transforms sit between sources and sinks in Vector’s topology: Each transform:
  1. Receives events from one or more inputs (sources or other transforms)
  2. Processes events according to its configuration
  3. Emits results to one or more outputs (other transforms or sinks)
  4. Handles backpressure from downstream components

Transform Types

Parsing

Extract structured data from raw text

Filtering

Select or discard events based on conditions

Routing

Send events to different destinations

Enrichment

Add context and metadata

Aggregation

Combine multiple events

Conversion

Change event types (logs ↔ metrics)

Transform Categories

Parsing and Structuring

The most powerful and commonly used transform. Uses Vector Remap Language (VRL) for complex event manipulation.
transforms:
  parse_logs:
    type: remap
    inputs:
      - raw_logs
    source: |
      # Parse JSON log message
      . = parse_json!(.message)
      
      # Extract timestamp
      .timestamp = parse_timestamp!(.time, "%Y-%m-%d %H:%M:%S")
      
      # Add environment tag
      .environment = "production"
      
      # Parse user agent
      .user_agent = parse_user_agent!(.user_agent_string)
      
      # Remove sensitive data
      del(.password)
      del(.api_key)
Use cases: JSON parsing, log parsing, field extraction, data transformation, enrichment
Extract structured fields using regular expressions. Less powerful than remap but simpler for basic parsing.
transforms:
  parse_apache:
    type: regex_parser
    inputs:
      - apache_logs
    regex: '^(?P<ip>\S+) \S+ (?P<user>\S+) \[(?P<timestamp>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) \S+" (?P<status>\d+) (?P<bytes>\d+)$'
    field: message
Use Grok patterns (Logstash-compatible) for parsing common log formats.
transforms:
  grok_parse:
    type: grok_parser
    inputs:
      - logs
    pattern: '%{COMMONAPACHELOG}'

Filtering and Sampling

Keep or discard events based on conditions.
transforms:
  # Keep only errors
  errors_only:
    type: filter
    inputs:
      - parsed_logs
    condition: '.level == "error" || .status >= 400'
  
  # Drop health check logs
  no_healthchecks:
    type: filter
    inputs:
      - parsed_logs
    condition: '.path != "/health" && .path != "/ready"'
Use cases: Reducing data volume, removing noise, isolating specific events
Keep only a percentage of events for high-volume data.
transforms:
  sample_debug_logs:
    type: sample
    inputs:
      - debug_logs
    rate: 10              # Keep 1 in every 10 events
    key_field: request_id # Sample by request (optional)
Use cases: Cost reduction, load testing, debugging high-traffic endpoints
Remove duplicate events based on field values.
transforms:
  remove_dupes:
    type: dedupe
    inputs:
      - logs
    fields:
      match:
        - request_id
        - timestamp
    cache:
      num_events: 10000   # Remember last 10k events

Routing and Distribution

Route events to named outputs based on conditions.
transforms:
  route_by_severity:
    type: route
    inputs:
      - logs
    route:
      critical: '.level == "critical" || .level == "fatal"'
      errors: '.level == "error"'
      warnings: '.level == "warning"'
      info: '.level == "info"'
      # _unmatched: Everything else

sinks:
  pagerduty_alerts:
    type: http
    inputs:
      - route_by_severity.critical
    uri: https://events.pagerduty.com/v2/enqueue
  
  elasticsearch_errors:
    type: elasticsearch
    inputs:
      - route_by_severity.errors
      - route_by_severity.warnings
  
  s3_archive:
    type: aws_s3
    inputs:
      - route_by_severity._unmatched  # Everything else
Create parallel processing paths for different event types.
transforms:
  split_by_type:
    type: swimlanes
    inputs:
      - logs
    lanes:
      application:
        type: filter
        condition: '.source_type == "application"'
      system:
        type: filter
        condition: '.source_type == "system"'

Enrichment and Context

Add external data from enrichment tables (CSV, databases).
enrichment_tables:
  geoip:
    type: geoip
    path: /usr/share/GeoIP/GeoLite2-City.mmdb
  
  user_data:
    type: file
    file:
      path: /etc/vector/users.csv
      encoding:
        type: csv
    schema:
      user_id: integer
      name: string
      department: string

transforms:
  enrich_logs:
    type: remap
    inputs:
      - logs
    source: |
      # Add GeoIP data
      .geo = get_enrichment_table_record!("geoip", {
        "ip": .ip_address
      })
      
      # Add user information
      .user = get_enrichment_table_record!("user_data", {
        "user_id": .user_id
      })
Write custom transformation logic in Lua.
transforms:
  custom_logic:
    type: lua
    inputs:
      - logs
    version: "2"
    hooks:
      process: |
        function process(event, emit)
          -- Custom Lua logic
          event.log.processed = true
          event.log.custom_field = calculate_something(event.log.value)
          emit(event)
        end
Note: VRL (via remap) is preferred over Lua for better performance and type safety.

Aggregation and Reduction

Combine multiple events into a single aggregate event.
transforms:
  merge_by_request:
    type: reduce
    inputs:
      - logs
    group_by:
      - request_id
    merge_strategies:
      timestamp: min       # Keep earliest timestamp
      status: max          # Keep highest status code
      duration: sum        # Sum all durations
      messages: array      # Collect all messages
    ends_when: '.status != null && .final == true'
    expire_after_ms: 30000  # Flush after 30s
Use cases: Combining fragmented logs, request/response pairing, transaction assembly
Convert log events into aggregate metrics.
transforms:
  log_to_metrics:
    type: aggregate
    inputs:
      - parsed_logs
    interval_ms: 60000    # Aggregate every minute
    aggregates:
      - name: requests_per_endpoint
        kind: counter
        tags:
          endpoint: "{{ path }}"
          method: "{{ method }}"
          status: "{{ status }}"

Type Conversion

Convert log events into metrics.
transforms:
  extract_metrics:
    type: log_to_metric
    inputs:
      - access_logs
    metrics:
      - type: counter
        field: request_count
        name: http_requests_total
        namespace: app
        tags:
          method: "{{ method }}"
          status: "{{ status }}"
      
      - type: histogram
        field: duration_ms
        name: http_request_duration_milliseconds
        namespace: app
Convert metric events into log events.
transforms:
  metrics_as_logs:
    type: metric_to_log
    inputs:
      - host_metrics
    host_tag: host
    timezone: local

Specialized Transforms

throttle

Rate limit events to prevent overwhelming downstream systems

tag_cardinality_limit

Prevent cardinality explosion in metric tags
Control event throughput to prevent overwhelming downstream systems.
transforms:
  rate_limit:
    type: throttle
    inputs:
      - high_volume_logs
    threshold: 1000       # Max events per window
    window_secs: 1        # Per second
    key_field: client_id  # Rate limit per client
Prevent cardinality explosion by limiting unique tag combinations.
transforms:
  protect_metrics:
    type: tag_cardinality_limit
    inputs:
      - app_metrics
    limit_exceeded_action: drop_tag
    mode: probabilistic
    value_limit: 1000     # Max unique values per tag

Transform Behavior

Synchronous vs. Asynchronous

Synchronous transforms (e.g., filter, remap):
  • Process events immediately
  • Maintain event order
  • Support concurrent processing for performance
  • Most common type
Asynchronous transforms (e.g., reduce, aggregate):
  • Process events over time windows
  • May reorder events
  • Require internal state management
  • Used for aggregation and stateful operations

Multiple Outputs

Some transforms support multiple named outputs:
transforms:
  route_logs:
    type: route
    inputs:
      - logs
    route:
      errors: '.level == "error"'
      warnings: '.level == "warning"'
      # _unmatched: implicit output for unmatched events

# Reference specific outputs
sinks:
  error_sink:
    inputs:
      - route_logs.errors
  
  warning_sink:
    inputs:
      - route_logs.warnings
  
  other_sink:
    inputs:
      - route_logs._unmatched

Vector Remap Language (VRL)

VRL is Vector’s purpose-built language for event transformation. It’s the recommended way to process events.

VRL Features

  • Type-safe: Compile-time type checking prevents runtime errors
  • Fast: Compiled to efficient bytecode
  • Ergonomic: Designed specifically for event processing
  • Infallible: Fallible operations use ! to handle errors explicitly

Common VRL Patterns

transforms:
  parse:
    type: remap
    source: |
      . = parse_json!(.message)

VRL Error Handling

transforms:
  safe_parsing:
    type: remap
    source: |
      # Fallible operations use ! to handle errors
      parsed, err = parse_json(.message)
      if err != null {
        .parse_error = err
        .parsed = false
      } else {
        . = parsed
        .parsed = true
      }
Test VRL expressions interactively with the vector vrl REPL:
vector vrl

Performance Optimization

Transform Ordering

Order transforms to minimize processing:
# Good: Filter early, process less data
transforms:
  1_filter:
    type: filter
    inputs: [logs]
    condition: '.level != "debug"'
  
  2_parse:
    type: remap
    inputs: [1_filter]
    source: |
      . = parse_json!(.message)  # Only parse filtered events

# Bad: Parse everything, then filter
transforms:
  1_parse:
    type: remap
    inputs: [logs]
    source: |
      . = parse_json!(.message)  # Parse all events
  
  2_filter:
    type: filter
    inputs: [1_parse]
    condition: '.level != "debug"'  # Filter after expensive parsing

Concurrent Processing

Vector automatically enables concurrency for eligible transforms. To maximize performance:
  • Use remap over lua (VRL is faster and supports better concurrency)
  • Avoid stateful operations when possible
  • Use route to split traffic before expensive operations

Memory Management

transforms:
  # Reduce memory in aggregating transforms
  aggregate:
    type: reduce
    expire_after_ms: 5000  # Flush state frequently
  
  # Limit cache sizes
  dedupe:
    type: dedupe
    cache:
      num_events: 5000     # Smaller cache = less memory

Best Practices

  • Parse and structure data as early as possible
  • Filter out unnecessary data before expensive operations
  • Route to different destinations at the end of processing
VRL is faster, safer, and better integrated:
  • Type safety prevents runtime errors
  • Better performance through compilation
  • First-class support for Vector data types
  • Interactive REPL for testing
transforms:
  safe_transform:
    type: remap
    drop_on_error: false  # Keep events even if VRL fails
    source: |
      parsed, err = parse_json(.message)
      if err == null {
        . = parsed
      }
Use unit tests for transform logic:
tests:
  - name: parse_apache_logs
    inputs:
      - insert_at: parse_logs
        value: '{"message": "127.0.0.1 - - [01/Jan/2024:00:00:00 +0000] GET /api HTTP/1.1 200"}'
    outputs:
      - extract_from: parse_logs
        conditions:
          - type: vrl
            source: '.ip == "127.0.0.1" && .status == 200'
Watch internal metrics for bottlenecks:
  • component_received_events_total
  • component_sent_events_total
  • component_errors_total
  • component_execution_time_seconds

Troubleshooting

Events Not Flowing

  1. Check transform condition logic
  2. Verify input references are correct
  3. Look for errors in VRL compilation
  4. Enable debug logging: VECTOR_LOG=debug

High Memory Usage

  • Reduce cache sizes in dedupe
  • Decrease expiration times in reduce and aggregate
  • Add sample transforms for high-volume data
  • Filter earlier in the pipeline

VRL Errors

Use the VRL REPL to debug:
echo '{"message": "test", "level": "info"}' | vector vrl '. = parse_json!(.message)'

Build docs developers (and LLMs) love