Overview
Tiered storage provides:- Cost reduction - Store historical data in cheaper cloud storage
- Infinite retention - Keep data indefinitely without BookKeeper capacity limits
- Automatic offloading - Offload based on time or size thresholds
- Transparent access - Consumers can read offloaded data seamlessly
- Multiple backends - Support for S3, GCS, Azure, filesystem, and custom implementations
How It Works
- Messages are written to BookKeeper as usual
- When configured thresholds are met, older ledgers are offloaded
- Ledger data is copied to tiered storage
- After a configurable delay, data is deleted from BookKeeper
- Consumers can read from both BookKeeper and tiered storage
Supported Storage Backends
- AWS S3 - Amazon Simple Storage Service
- Google Cloud Storage - Google’s object storage
- Azure Blob Storage - Microsoft Azure blob storage
- Alibaba Cloud OSS - Alibaba Cloud Object Storage Service
- Filesystem - Local or network-mounted filesystems
- Custom - Implement custom offloaders
Configuration
Global Broker Configuration
Configure tiered storage inbroker.conf:
Driver to use for offloading data to long-term storage.Options:
aws-s3 | google-cloud-storage | azureblob | aliyun-oss | filesystem | S3Maximum number of thread pool threads for ledger offloading.
Maximum number of read thread pool threads for ledger offloading.
Maximum prefetch rounds for ledger reading during offloading.
Delay between successfully offloading a ledger and deleting it from BookKeeper. Default is 4 hours (14400000 ms).
Number of bytes before triggering automatic offload to long-term storage.
-1 disables automatic offloading.Number of seconds before triggering automatic offload to long-term storage.
-1 disables time-based offloading.Directory containing offloader implementations (NAR files).
AWS S3 Configuration
Configure S3 as the offload target:Set to
aws-s3 for Amazon S3 offloading.AWS region where the S3 bucket is located (e.g.,
us-west-2).S3 bucket name for storing offloaded ledgers.
Alternative S3 endpoint to connect to (useful for S3-compatible storage or testing).
Maximum block size in bytes (64 MiB default, 5 MiB minimum).
Read buffer size in bytes (1 MiB default).
Google Cloud Storage Configuration
Configure GCS as the offload target:Set to
google-cloud-storage for GCS offloading.GCS region where the bucket is located (e.g.,
us-central1).GCS bucket name for storing offloaded ledgers.
Maximum block size in bytes (128 MiB default, 5 MiB minimum). Maximum ledger size is 32 times the block size due to JClouds limitations.
Read buffer size in bytes (1 MiB default).
Path to JSON file containing service account credentials.
Azure Blob Storage Configuration
Filesystem Configuration
For local or network-mounted filesystems:Set to
filesystem for local/network filesystem offloading.Filesystem URI (e.g.,
file:///mnt/offload or hdfs://namenode:8020/pulsar).Path to Hadoop configuration file for HDFS filesystems.
Namespace-Level Configuration
Override global settings for specific namespaces:Manual Offloading
Trigger offload manually for a topic:Reading Offloaded Data
Consumers automatically read from tiered storage when necessary. Configure read priority:Read priority when ledgers exist in both BookKeeper and tiered storage.Options:
tiered-storage-first- Prefer reading from tiered storagebookkeeper-first- Prefer reading from BookKeeper
Monitoring
Offload Metrics
Check Offload Status
Storage Structure
Offloaded data is organized in the storage backend:Cost Optimization
Storage Class Selection
Use appropriate storage classes for cost savings:- AWS S3 - Use S3 Standard-IA or S3 Glacier for infrequent access
- GCS - Use Nearline or Coldline storage classes
- Azure - Use Cool or Archive tiers
Lifecycle Policies
Configure storage lifecycle policies to transition data:Security
AWS IAM Permissions
Required S3 permissions:Encryption
Enable server-side encryption:- AWS S3 - Use SSE-S3, SSE-KMS, or SSE-C
- GCS - Use Google-managed or customer-managed encryption keys
- Azure - Use Azure Storage Service Encryption
Troubleshooting
Offload Failures
Check broker logs for errors:- Insufficient permissions on storage bucket
- Network connectivity to cloud storage
- Invalid credentials or configuration
- Insufficient disk space for temporary files
Performance Issues
-
Increase offload threads:
- Adjust block sizes for better throughput
- Monitor read latency from tiered storage
Best Practices
- Set appropriate thresholds - Balance BookKeeper capacity with offload frequency
- Use size-based thresholds - More predictable than time-based for cost control
- Configure deletion lag - Allow time for data verification before deletion
- Monitor costs - Track storage costs in cloud provider billing
- Test recovery - Verify consumers can read offloaded data
- Plan for latency - Cloud storage has higher latency than BookKeeper
- Use lifecycle policies - Automatically transition to cheaper storage classes
- Secure credentials - Use IAM roles or service accounts instead of static credentials
- Regional storage - Co-locate storage with Pulsar clusters for lower latency
- Backup strategy - Combine tiered storage with separate backup procedures