Skip to main content
The AWS S3 sink stores observability events in Amazon S3 buckets. It supports automatic partitioning, multiple compression algorithms, and flexible file naming strategies.

Configuration

[sinks.s3]
type = "aws_s3"
inputs = ["my_source"]

# S3 bucket configuration
bucket = "my-logs-bucket"
key_prefix = "logs/date=%F/"

# AWS region
region = "us-east-1"

# Authentication
auth.access_key_id = "${AWS_ACCESS_KEY_ID}"
auth.secret_access_key = "${AWS_SECRET_ACCESS_KEY}"

# Encoding and compression
encoding.codec = "json"
compression = "gzip"

# Batching
batch.max_bytes = 10485760  # 10MB
batch.timeout_secs = 300

Core Parameters

bucket
string
required
The S3 bucket name. Must not include a leading s3:// or trailing /.
bucket = "my-logs-bucket"
key_prefix
string
default:"date=%F"
Prefix to apply to all object keys. Supports template syntax and strftime date formatting.Use a trailing / to create a directory-like structure.
region
string
required
AWS region where the S3 bucket is located.
region = "us-east-1"
region = "eu-west-1"
endpoint
string
Custom S3-compatible endpoint (for S3-compatible storage).
endpoint = "https://s3.custom.com"

Authentication

The S3 sink supports multiple AWS authentication methods:
auth.access_key_id
string
AWS access key ID for authentication.
auth.secret_access_key
string
AWS secret access key for authentication.
auth.assume_role
string
ARN of an IAM role to assume for authentication.

Static Credentials

[sinks.s3.auth]
access_key_id = "${AWS_ACCESS_KEY_ID}"
secret_access_key = "${AWS_SECRET_ACCESS_KEY}"

IAM Role

When running on EC2, ECS, or EKS, Vector can automatically use IAM role credentials:
# No auth configuration needed - uses instance profile
[sinks.s3]
bucket = "my-bucket"
region = "us-east-1"

Assume Role

[sinks.s3.auth]
assume_role = "arn:aws:iam::123456789012:role/VectorS3WriteRole"

External ID

[sinks.s3.auth]
assume_role = "arn:aws:iam::123456789012:role/VectorS3WriteRole"
external_id = "external-id-12345"

File Naming

filename_time_format
string
default:"%s"
Timestamp format for the time component of object keys using strftime specifiers.Set to empty string to disable timestamp in filename.
filename_time_format = "%s"          # Unix timestamp: 1658176486
filename_time_format = "%Y%m%d%H%M%S" # 20220718203446
filename_time_format = ""            # No timestamp
filename_append_uuid
boolean
default:"true"
Append a UUID v4 token to the end of object keys to ensure uniqueness.Useful in high-volume workloads to prevent name collisions.
filename_append_uuid = true
# Results in: date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547.log.gz
filename_extension
string
Override the file extension. By default, the extension is determined by the compression setting.
filename_extension = "json"
filename_extension = "log"

Encoding

encoding.codec
string
required
How events are encoded before writing to S3. Options:
  • json: JSON encoding (one object per line)
  • text: Plain text (one line per event)
  • ndjson: Newline-delimited JSON
  • csv: CSV format
  • logfmt: Logfmt encoding
  • avro: Apache Avro binary format
  • parquet: Apache Parquet columnar format
[sinks.s3.encoding]
codec = "json"
encoding.only_fields
array
Include only specified fields in the output.
[sinks.s3.encoding]
codec = "json"
only_fields = ["timestamp", "message", "level"]
encoding.except_fields
array
Exclude specified fields from the output.
[sinks.s3.encoding]
codec = "json"
except_fields = ["_metadata", "secret_token"]
encoding.timestamp_format
string
default:"rfc3339"
Format for timestamp fields. Options: rfc3339, unix, unix_ms, unix_ns.
[sinks.s3.encoding]
codec = "json"
timestamp_format = "unix"

Compression

compression
string
default:"gzip"
Compression algorithm. Options: none, gzip, zstd, snappy.Compression reduces storage costs and network bandwidth.
compression = "gzip"  # Good balance of speed and ratio
compression = "zstd"  # Better compression, slightly slower
compression = "none"  # No compression

Batching

Configure batching to control file size and flush frequency:
batch.max_bytes
integer
default:"10485760"
Maximum size of a batch in bytes before creating a new file (10MB default).
[sinks.s3.batch]
max_bytes = 52428800  # 50MB
batch.timeout_secs
float
default:"300"
Maximum time to wait before flushing a partial batch (5 minutes default).
[sinks.s3.batch]
timeout_secs = 60  # Flush every minute
[sinks.s3.batch]
max_bytes = 10485760    # 10MB files
timeout_secs = 300      # Flush every 5 minutes

S3 Options

Advanced S3-specific options:
options.acl
string
Canned ACL to apply to created objects. Options: private, public-read, public-read-write, authenticated-read, bucket-owner-read, bucket-owner-full-control.
[sinks.s3.options]
acl = "bucket-owner-full-control"
options.storage_class
string
default:"STANDARD"
S3 storage class. Options:
  • STANDARD: Standard storage
  • REDUCED_REDUNDANCY: Reduced redundancy
  • INTELLIGENT_TIERING: Automatic cost optimization
  • STANDARD_IA: Infrequent access
  • ONEZONE_IA: One zone infrequent access
  • GLACIER: Glacier storage
  • GLACIER_IR: Glacier instant retrieval
  • DEEP_ARCHIVE: Glacier deep archive
[sinks.s3.options]
storage_class = "INTELLIGENT_TIERING"
options.server_side_encryption
string
Server-side encryption algorithm. Options: AES256, aws:kms.
[sinks.s3.options]
server_side_encryption = "aws:kms"
options.ssekms_key_id
string
KMS key ID for server-side encryption with KMS. Required when server_side_encryption = "aws:kms".Supports template syntax for dynamic key selection.
[sinks.s3.options]
server_side_encryption = "aws:kms"
ssekms_key_id = "arn:aws:kms:us-east-1:123456789012:key/abcd1234-..."
options.tags
object
Tags to apply to created objects.
[sinks.s3.options.tags]
Environment = "production"
Application = "vector"
CostCenter = "engineering"
options.content_encoding
string
Override the Content-Encoding header.
[sinks.s3.options]
content_encoding = "gzip"
options.content_type
string
Override the Content-Type header.
[sinks.s3.options]
content_type = "application/json"

TLS Configuration

tls.ca_file
string
Path to CA certificate for custom endpoints.
[sinks.s3.tls]
ca_file = "/path/to/ca.pem"

Request Configuration

request.timeout_secs
integer
default:"60"
Request timeout in seconds.
request.retry_attempts
integer
default:"5"
Number of retry attempts for failed requests.
[sinks.s3.request]
timeout_secs = 30
retry_attempts = 3

Advanced Options

force_path_style
boolean
default:"false"
Use path-style addressing (bucket in path) instead of virtual-hosted style. Required for some S3-compatible services.
force_path_style = true
# https://s3.custom.com/bucket/key instead of https://bucket.s3.custom.com/key

Complete Examples

Basic Configuration

[sinks.s3_logs]
type = "aws_s3"
inputs = ["processed_logs"]

bucket = "my-logs-bucket"
key_prefix = "logs/date=%F/"
region = "us-east-1"

encoding.codec = "json"
compression = "gzip"

[sinks.s3_logs.batch]
max_bytes = 10485760
timeout_secs = 300

Partitioned by Service and Date

[sinks.s3_partitioned]
type = "aws_s3"
inputs = ["logs"]

bucket = "prod-logs"
key_prefix = "service={{ service }}/date=%Y/%m/%d/"
region = "us-west-2"

auth.assume_role = "arn:aws:iam::123456789012:role/VectorRole"

encoding.codec = "json"
compression = "zstd"

[sinks.s3_partitioned.batch]
max_bytes = 52428800  # 50MB
timeout_secs = 600

With KMS Encryption

[sinks.s3_encrypted]
type = "aws_s3"
inputs = ["sensitive_logs"]

bucket = "secure-logs-bucket"
key_prefix = "encrypted/date=%F/"
region = "us-east-1"

encoding.codec = "json"
compression = "gzip"

[sinks.s3_encrypted.options]
server_side_encryption = "aws:kms"
ssekms_key_id = "arn:aws:kms:us-east-1:123456789012:key/12345678-..."
storage_class = "STANDARD_IA"

[sinks.s3_encrypted.options.tags]
Classification = "confidential"
Retention = "7years"

High-Volume Configuration

[sinks.s3_high_volume]
type = "aws_s3"
inputs = ["metrics"]

bucket = "metrics-bucket"
key_prefix = "metrics/year=%Y/month=%m/day=%d/hour=%H/"
region = "us-east-1"

encoding.codec = "json"
compression = "zstd"

filename_time_format = "%Y%m%d%H%M%S"
filename_append_uuid = true

[sinks.s3_high_volume.batch]
max_bytes = 104857600  # 100MB
timeout_secs = 120

[sinks.s3_high_volume.request]
retry_attempts = 10
timeout_secs = 120

[sinks.s3_high_volume.options]
storage_class = "INTELLIGENT_TIERING"

S3-Compatible Storage (MinIO)

[sinks.minio]
type = "aws_s3"
inputs = ["logs"]

bucket = "vector-logs"
key_prefix = "logs/"
endpoint = "https://minio.example.com"
region = "us-east-1"  # Still required but can be any value

force_path_style = true

auth.access_key_id = "minioadmin"
auth.secret_access_key = "minioadmin"

encoding.codec = "json"
compression = "gzip"

Troubleshooting

Authentication Issues

If you encounter authentication errors:
  1. Verify AWS credentials are correct
  2. Check IAM permissions include s3:PutObject on the bucket
  3. Ensure the bucket exists and region is correct
  4. For assume role, verify trust relationships

Object Not Created

If objects aren’t appearing in S3:
  1. Check batch timeout - may need to wait for flush
  2. Verify bucket name and region are correct
  3. Review Vector logs for errors
  4. Ensure sufficient data to trigger batch

Performance Issues

  1. Increase batch size: Larger files reduce API calls
  2. Enable compression: Reduces upload time
  3. Adjust timeout: Balance between latency and file size
  4. Use multiple sinks: Partition across buckets/prefixes
  5. Choose appropriate storage class: Consider access patterns

Best Practices

  1. Use date-based partitioning for easier querying and lifecycle management
  2. Enable compression to reduce storage costs (30-50% savings)
  3. Set appropriate batch sizes to balance cost and latency
  4. Use IAM roles instead of static credentials when possible
  5. Enable KMS encryption for sensitive data
  6. Add meaningful tags for cost tracking and organization
  7. Use Intelligent-Tiering storage class for unknown access patterns
  8. Configure S3 Lifecycle policies to archive or delete old data
  9. Enable S3 versioning for important data
  10. Monitor CloudWatch metrics for S3 API usage

Cost Optimization

  1. Compression: Use gzip or zstd to reduce storage costs
  2. Batch size: Larger batches reduce PUT request costs
  3. Storage class: Use STANDARD_IA for infrequent access
  4. Lifecycle policies: Automatically transition to cheaper storage
  5. Partitioning: Makes selective deletion easier

See Also

Build docs developers (and LLMs) love