Backup Strategies
Pulsar supports several backup approaches:- Tiered Storage - Automatic offloading to cloud storage (recommended for retention)
- Topic Snapshots - Export topic data to external storage
- Metadata Backup - Backup cluster metadata from ZooKeeper/metadata store
- BookKeeper Backup - Backup raw BookKeeper ledger data
- Geo-Replication - Use remote clusters as live backups
Backing Up Topic Data
Using Tiered Storage
Tiered storage provides automatic, continuous backup to cloud storage:Manual Topic Export
Export messages from a topic to a file:Backup with Message Reader
Use a reader for non-destructive backup:Backing Up Metadata
Tenant and Namespace Configuration
Export tenant and namespace policies:Topic Configuration
Export topic-level settings:Schema Backup
Export schemas:Subscription State
Backup subscription positions:Backing Up Metadata Store
ZooKeeper Backup
For clusters using ZooKeeper:Metadata Store Backup
For other metadata stores (e.g., etcd):BookKeeper Backup
Ledger Backup
Backup BookKeeper ledger data:Metadata Backup
BookKeeper metadata is stored in ZooKeeper under/ledgers. Include it in ZooKeeper backups.
Restoring Topic Data
Restore from Tiered Storage
Data in tiered storage is automatically accessible. Configure the broker with the same offload settings:Restore from Export
Republish messages from backup:Restore with Event Time
Preserve original timestamps:Restoring Metadata
Tenant and Namespace
Recreate tenant and namespace configuration:Schema Restore
Upload schemas:Subscription Position
Reset subscription to backed-up position:Disaster Recovery Procedures
Complete Cluster Recovery
- Restore metadata store (ZooKeeper/etcd)
- Restore BookKeeper data (if not using tiered storage)
- Start Pulsar brokers with original configuration
- Verify cluster health
- Restore topic configurations and schemas
- Verify data accessibility
Topic-Level Recovery
- Create topic with original configuration
- Upload schema if applicable
- Restore messages from backup
- Create subscriptions at appropriate positions
- Verify consumers can read data
Automated Backup Scripts
Daily Backup Script
Backup Retention
Geo-Replication as Backup
Use geo-replication for live backup:- Point clients to backup cluster
- Verify data integrity
- Rebuild primary cluster
- Re-enable replication when primary is restored
Monitoring Backups
Backup Validation
Regularly verify backups:Backup Metrics
Track backup operations:- Last successful backup timestamp
- Backup size and duration
- Failed backup attempts
- Tiered storage offload metrics
Best Practices
- Multiple backup methods - Combine tiered storage, geo-replication, and snapshots
- Regular testing - Test restore procedures quarterly
- Offsite backups - Store backups in different geographic regions
- Retention policies - Define how long to keep backups
- Incremental backups - Use tiered storage for continuous backup
- Document procedures - Maintain runbooks for recovery scenarios
- Monitor backup health - Alert on backup failures
- Secure backups - Encrypt backup data at rest and in transit
- Version control - Keep configuration and scripts in version control
- Automate backups - Use cron jobs or orchestration tools
- Test failure scenarios - Practice recovery from various failure modes
- Backup metadata - Don’t forget schemas, subscriptions, and configurations
Recovery Time Objectives
Plan for different recovery scenarios:- Single topic corruption - Minutes to hours (restore from tiered storage)
- Namespace deletion - Hours (restore metadata and data)
- Complete cluster failure - Hours to days (rebuild cluster and restore data)
- Data center disaster - Switch to geo-replicated cluster (minutes to hours)
Compliance and Audit
For regulated industries:- Backup logs - Maintain audit trails of backup operations
- Access controls - Restrict backup access to authorized personnel
- Encryption - Encrypt backups meeting compliance requirements
- Retention compliance - Align backup retention with data regulations
- Recovery testing - Document and track recovery test results