Skip to main content
Sinks are Vector components that deliver events to external systems. They are the final stage in your Vector pipeline, sending logs, metrics, and traces to databases, object storage, SaaS platforms, and other destinations. Vector includes 40+ built-in sinks for common integrations.

How Sinks Work

Sinks receive events from sources and transforms, then:
  1. Buffer events in memory or on disk
  2. Batch events for efficient delivery
  3. Encode events in the destination’s required format
  4. Compress data to reduce network usage (optional)
  5. Deliver events to the destination via network protocols
  6. Retry failed deliveries with exponential backoff
  7. Acknowledge successful delivery (when acknowledgements are enabled)

Sink Architecture

Sink Categories

Cloud Platform Sinks

AWS

S3, CloudWatch Logs/Metrics, Kinesis, SQS, SNS

Google Cloud

Cloud Storage, Cloud Logging, Pub/Sub, BigQuery

Azure

Blob Storage, Monitor Logs, Event Hubs
Example: AWS S3 sink
sinks:
  s3_archive:
    type: aws_s3
    inputs:
      - parsed_logs
    region: us-east-1
    bucket: my-log-archive
    key_prefix: "logs/date=%Y-%m-%d/"
    compression: gzip
    encoding:
      codec: json
    batch:
      max_bytes: 10485760    # 10MB per file
      timeout_secs: 300       # Or 5 minutes, whichever comes first
Example: AWS CloudWatch Logs
sinks:
  cloudwatch:
    type: aws_cloudwatch_logs
    inputs:
      - application_logs
    region: us-east-1
    group_name: /aws/application/prod
    stream_name: "{{ host }}-{{ application }}"  # Dynamic stream names
    encoding:
      codec: json

Observability Platforms

Datadog

Unified observability platform for logs, metrics, and traces

New Relic

Full-stack observability and APM

Splunk

Enterprise log management and SIEM

Elastic

Elasticsearch for search and analytics
Example: Datadog logs and metrics
sinks:
  datadog_logs:
    type: datadog_logs
    inputs:
      - application_logs
    default_api_key: "${DD_API_KEY}"
    endpoint: https://http-intake.logs.datadoghq.com
    compression: gzip
  
  datadog_metrics:
    type: datadog_metrics
    inputs:
      - host_metrics
    default_api_key: "${DD_API_KEY}"
    endpoint: https://api.datadoghq.com
Example: Elasticsearch
sinks:
  elasticsearch:
    type: elasticsearch
    inputs:
      - structured_logs
    endpoint: https://elasticsearch.example.com:9200
    auth:
      strategy: basic
      user: elastic
      password: "${ES_PASSWORD}"
    bulk:
      index: "logs-%Y.%m.%d"  # Daily indices
      action: create
    encoding:
      codec: json
    buffer:
      type: disk
      max_size: 268435488     # 256MB disk buffer

Metrics Systems

Prometheus

Prometheus remote write and exporter

InfluxDB

Time-series database for metrics

Graphite

Classic metrics storage system

StatsD

StatsD protocol for metrics aggregation
Example: Prometheus Remote Write
sinks:
  prometheus:
    type: prometheus_remote_write
    inputs:
      - application_metrics
    endpoint: https://prometheus.example.com/api/v1/write
    default_namespace: app
    auth:
      strategy: bearer
      token: "${PROM_TOKEN}"
Example: Prometheus Exporter
sinks:
  prometheus_exporter:
    type: prometheus_exporter
    inputs:
      - vector_metrics
    address: 0.0.0.0:9598
    default_namespace: vector
    # Metrics available at http://localhost:9598/metrics

Databases

ClickHouse

Columnar database for analytics

PostgreSQL

Relational database (via COPY protocol)
Example: ClickHouse
sinks:
  clickhouse:
    type: clickhouse
    inputs:
      - structured_logs
    endpoint: http://clickhouse:8123
    database: logs
    table: application_logs
    skip_unknown_fields: false
    encoding:
      codec: json
    batch:
      max_events: 10000
      timeout_secs: 10

Message Queues

Kafka

Distributed streaming platform

NATS

Lightweight messaging system

AMQP

RabbitMQ and AMQP protocol

Redis

In-memory data store with pub/sub
Example: Kafka
sinks:
  kafka:
    type: kafka
    inputs:
      - events
    bootstrap_servers: kafka1:9092,kafka2:9092,kafka3:9092
    topic: "logs-{{ environment }}"
    key_field: request_id
    encoding:
      codec: json
    compression: snappy
    batch:
      timeout_secs: 1
    acknowledgements:
      enabled: true

HTTP and Webhooks

Example: Generic HTTP sink
sinks:
  http_endpoint:
    type: http
    inputs:
      - logs
    uri: https://api.example.com/logs
    method: post
    encoding:
      codec: json
    headers:
      Authorization: "Bearer ${API_TOKEN}"
      Content-Type: "application/json"
    batch:
      max_events: 100
    request:
      retry_attempts: 5
      retry_initial_backoff_secs: 1
      retry_max_duration_secs: 300

Development and Testing

console

Print events to stdout (debugging)

blackhole

Discard events (testing throughput)

file

Write to local files
Example: Console output
sinks:
  debug:
    type: console
    inputs:
      - logs
    encoding:
      codec: json
      json:
        pretty: true

Sink Configuration

Common Options

All sinks support these configuration options:
sinks:
  my_sink:
    type: <sink_type>
    inputs:
      - source_or_transform
    
    # Buffering configuration
    buffer:
      type: memory              # memory or disk
      max_events: 500           # For memory buffers
      max_size: 268435488       # For disk buffers (256MB)
      when_full: block          # block or drop_newest
    
    # Batching configuration
    batch:
      max_events: 100           # Max events per batch
      max_bytes: 1048576        # Max bytes per batch (1MB)
      timeout_secs: 5           # Max time to wait for batch
    
    # Encoding
    encoding:
      codec: json               # json, text, protobuf, etc.
      except_fields:
        - sensitive_field       # Exclude these fields
      only_fields:
        - field1                # Only include these fields
        - field2
    
    # Compression
    compression: gzip           # gzip, zstd, snappy, none
    
    # Request/delivery settings
    request:
      concurrency: 5            # Parallel requests
      retry_attempts: 5
      retry_initial_backoff_secs: 1
      retry_max_duration_secs: 300
      timeout_secs: 60
    
    # Acknowledgements
    acknowledgements:
      enabled: false            # Wait for delivery confirmation
    
    # Health checks
    healthcheck:
      enabled: true             # Check connectivity at startup

Buffering Strategies

Fast, low-latency buffering in RAM. Suitable for most use cases.
sinks:
  my_sink:
    buffer:
      type: memory
      max_events: 500       # Buffer up to 500 events
      when_full: block      # Backpressure when full
Pros: Fast, low overhead Cons: Data loss on crash, limited capacity

Batching

Batching improves throughput by sending multiple events together:
sinks:
  elasticsearch:
    batch:
      max_events: 100           # Send when 100 events collected
      max_bytes: 1048576        # Or when 1MB of data
      timeout_secs: 5           # Or after 5 seconds
    # First condition met triggers the batch
Tuning guidelines:
  • Higher max_events: Better throughput, higher latency
  • Lower timeout_secs: Lower latency, more network requests
  • Larger max_bytes: More compression efficiency

Encoding Formats

Sinks support multiple encoding formats:
sinks:
  json_sink:
    encoding:
      codec: json
      json:
        pretty: false       # Compact JSON

Compression

Compress data before transmission to reduce bandwidth:
sinks:
  compressed_sink:
    compression: gzip           # gzip (most compatible)
    # compression: zstd         # zstd (better compression)
    # compression: snappy       # snappy (fastest)
    # compression: none         # no compression
Compression comparison:
  • gzip: Best compatibility, good compression
  • zstd: Best compression ratio, fast decompression
  • snappy: Fastest, moderate compression
  • none: No CPU overhead, more bandwidth

Health Checks

Sinks run health checks at startup to verify connectivity:
sinks:
  checked_sink:
    healthcheck:
      enabled: true             # Default: true
    # If healthcheck fails:
    # - Vector logs an error
    # - Vector may refuse to start (configurable globally)
Disable globally:
# vector.yaml
healthchecks:
  enabled: false                # Skip all health checks
  require_healthy: false        # Don't block startup on failures

Reliability and Delivery Guarantees

Acknowledgements

Acknowledgements provide end-to-end delivery confirmation:
sources:
  kafka_source:
    type: kafka
    acknowledgements:
      enabled: true             # Don't commit offsets until ack

transforms:
  process:
    type: remap
    inputs: [kafka_source]
    source: |
      . = parse_json!(.message)

sinks:
  elasticsearch:
    type: elasticsearch
    inputs: [process]
    acknowledgements:
      enabled: true             # Ack only after ES confirms write
When to use:
  • Financial data
  • Audit logs
  • Compliance-required data
Trade-offs:
  • Higher latency (wait for confirmation)
  • More memory (track in-flight events)
  • Better reliability (no data loss on sink failure)
Both the source and sink must have acknowledgements enabled for end-to-end delivery guarantees.

Retry Logic

Sinks automatically retry failed deliveries:
sinks:
  resilient_sink:
    request:
      retry_attempts: 5                   # Try up to 5 times
      retry_initial_backoff_secs: 1       # Start with 1s delay
      retry_max_duration_secs: 300        # Give up after 5 minutes
    # Exponential backoff: 1s, 2s, 4s, 8s, 16s
Retry triggers:
  • Network errors (connection refused, timeouts)
  • HTTP 429 (rate limiting)
  • HTTP 5xx (server errors)
Not retried:
  • HTTP 4xx (except 429) - client errors
  • Authentication failures
  • Malformed requests

Error Handling

When all retries are exhausted:
  1. Event is dropped
  2. Error is logged
  3. component_errors_total metric is incremented
  4. Downstream acknowledgements fail (if enabled)
To prevent data loss:
  • Use disk buffers for critical sinks
  • Enable acknowledgements
  • Configure dead letter queues (DLQ):
sinks:
  primary:
    type: elasticsearch
    # ... config ...
  
  # Send failures to S3 for later reprocessing
  dlq:
    type: aws_s3
    inputs:
      - primary._undeliverable  # Special output (future feature)

Performance Optimization

Concurrency

Increase parallel requests for better throughput:
sinks:
  high_throughput:
    type: http
    request:
      concurrency: 20           # 20 parallel requests
    batch:
      max_events: 100
      timeout_secs: 1
Guidelines:
  • Start with low concurrency (5-10)
  • Increase until destination is saturated
  • Monitor destination metrics (CPU, response time)
  • Consider destination rate limits

Batching

Larger batches improve throughput:
sinks:
  optimized:
    batch:
      max_events: 1000          # Large batches for high throughput
      max_bytes: 10485760       # 10MB
      timeout_secs: 10          # Accept higher latency

Compression

For remote sinks, compression usually helps:
sinks:
  remote:
    compression: gzip
    # Reduces network usage by 70-90%
    # Adds ~5-10ms CPU overhead

Network Tuning

sinks:
  tuned:
    request:
      timeout_secs: 60          # Match destination latency
      rate_limit_num: 100       # Max 100 requests...
      rate_limit_duration_secs: 1  # ...per second

Best Practices

  • Memory: Fast path for non-critical data
  • Disk: Critical data, large buffers, unstable networks
sinks:
  audit_logs:
    buffer:
      type: disk            # Critical data
      max_size: 1073741824
  
  debug_logs:
    buffer:
      type: memory          # Best effort
      max_events: 500
Watch these metrics:
  • component_sent_events_total: Events delivered
  • component_errors_total: Delivery failures
  • buffer_events: Buffer utilization
  • request_duration_seconds: Delivery latency
sources:
  metrics:
    type: internal_metrics

sinks:
  prometheus:
    type: prometheus_exporter
    inputs: [metrics]
    address: 0.0.0.0:9598
Match timeouts to destination characteristics:
  • Fast APIs: 10-30 seconds
  • Slow APIs: 60-120 seconds
  • Batch uploads: 300+ seconds
sinks:
  s3_dynamic:
    type: aws_s3
    bucket: logs
    key_prefix: "{{ environment }}/{{ application }}/date=%Y-%m-%d/"
  • Run health checks: vector validate --config vector.yaml
  • Test with console sink first
  • Use blackhole sink for performance testing
  • Monitor metrics during initial rollout

Troubleshooting

Events Not Reaching Destination

  1. Check sink health:
    curl http://localhost:8686/health
    
  2. Review logs:
    VECTOR_LOG=debug vector --config vector.yaml
    
  3. Verify network connectivity:
    telnet api.example.com 443
    
  4. Check metrics:
    curl http://localhost:8686/metrics | grep component_errors
    

High Latency

  • Reduce batch sizes
  • Increase concurrency
  • Check destination performance
  • Consider regional endpoints (reduce network distance)

Buffer Full / Backpressure

  • Increase buffer size
  • Add more sink instances (horizontal scaling)
  • Sample high-volume data
  • Optimize destination performance

Authentication Errors

  • Verify credentials in environment variables
  • Check IAM roles/permissions (for cloud sinks)
  • Ensure tokens haven’t expired
  • Review destination access logs

Build docs developers (and LLMs) love