Branching, tagging, and snapshot lifecycle management in Apache Iceberg
Iceberg supports branches and tags as named references to snapshots, enabling sophisticated snapshot lifecycle management beyond basic time travel. These features are essential for data quality workflows, auditing, and experimental data engineering.
Tags are named references to snapshots with their own retention policies:
-- Create a tag for end-of-month snapshotALTER TABLE prod.db.tableCREATE TAG `EOM-2024-02` AS OF VERSION 12345 RETAIN 180 DAYS;-- Create a tag for compliance audit (retain forever)ALTER TABLE prod.db.tableCREATE TAG `AUDIT-Q1-2024` AS OF VERSION 23456;-- Query using a tag SELECT * FROM prod.db.table VERSION AS OF 'EOM-2024-02';
Tags control when both the reference and the snapshot can be deleted:
-- Tag retained for 7 days, then expiredCREATE TAG `weekly-backup` RETAIN 7 DAYS;-- Tag retained forever (default)CREATE TAG `production-release-v2.0`;-- Update tag retention ALTER TABLE db.tableREPLACE TAG `weekly-backup` RETAIN 14 DAYS;
When expire_snapshots runs:
Expired tags are removed
Snapshots referenced only by expired tags can be deleted
-- Retain end-of-month snapshots for 7 yearsALTER TABLE financial_dataCREATE TAG `EOM-2024-01` AS OF VERSION 1000 RETAIN 2555 DAYS;ALTER TABLE financial_data CREATE TAG `EOM-2024-02` AS OF VERSION 2000 RETAIN 2555 DAYS;
Release Milestones
Mark production releases:
-- Tag production deployments (retain forever)ALTER TABLE product_catalogCREATE TAG `prod-release-2024-03-01` AS OF VERSION 5432;-- Reproduce exactly what customers sawSELECT * FROM product_catalog VERSION AS OF 'prod-release-2024-03-01';
Backup Points
Create recovery points before risky operations:
-- Before major data migrationALTER TABLE user_dataCREATE TAG `pre-migration-backup` RETAIN 30 DAYS;-- Perform migration...-- Rollback if neededCALL catalog_name.system.rollback_to_tag('db.user_data', 'pre-migration-backup');
Branches are mutable named references that can have new snapshots committed to them:
-- Create a branch from current snapshotALTER TABLE db.table CREATE BRANCH test_branch;-- Create branch from specific snapshot ALTER TABLE db.tableCREATE BRANCH experiment AS OF VERSION 12345;-- Write to a branch (Spark)SET spark.wap.branch = test_branch;INSERT INTO db.table VALUES (1, 'test');-- Query branch dataSELECT * FROM db.table.branch_test_branch;
-- Enable WAPALTER TABLE prod.db.table SET TBLPROPERTIES ( 'write.wap.enabled'='true');-- Create audit branchALTER TABLE prod.db.tableCREATE BRANCH audit_branch RETAIN 7 DAYS;-- Write to audit branch (Spark)SET spark.wap.branch = audit_branch;INSERT INTO prod.db.table SELECT * FROM staging.new_data;-- Validate data qualitySELECT count(*) as total, count(DISTINCT user_id) as unique_usersFROM prod.db.table.branch_audit_branch;-- Publish if validation passesCALL catalog_name.system.fast_forward( 'prod.db.table', 'main', 'audit_branch');
Experimental Features
Test changes without affecting production:
-- Create experiment branchALTER TABLE analytics.eventsCREATE BRANCH new_metric_experiment RETAIN 14 DAYS;-- Write experimental dataSET spark.wap.branch = new_metric_experiment;INSERT INTO analytics.eventsSELECT *, compute_new_metric(data) as new_metricFROM source;-- Analyze resultsSELECT avg(new_metric) FROM analytics.events.branch_new_metric_experiment;-- Merge if successful, or let branch expire
Staging Environments
Separate staging from production data:
-- Create staging branchALTER TABLE db.table CREATE BRANCH staging;-- Load staging dataSET spark.wap.branch = staging;COPY INTO db.table FROM 's3://bucket/staging/';-- Test queries against stagingSELECT * FROM db.table.branch_staging WHERE ...;-- Promote to main after testingCALL catalog_name.system.fast_forward('db.table', 'main', 'staging');
Parallel Data Processing
Isolate concurrent data pipelines:
-- Pipeline A writes to branch ACREATE BRANCH pipeline_a RETAIN 1 DAYS;-- Pipeline B writes to branch BCREATE BRANCH pipeline_b RETAIN 1 DAYS;-- Merge both when complete-- (requires conflict resolution if overlapping data)
-- Create tagALTER TABLE db.table CREATE TAG tag_name;-- From specific snapshotALTER TABLE db.table CREATE TAG tag_name AS OF VERSION 12345;-- With retention ALTER TABLE db.table CREATE TAG tag_name RETAIN 30 DAYS;
-- Query branchSELECT * FROM db.table.branch_branch_name;-- Or using VERSION AS OFSELECT * FROM db.table VERSION AS OF 'branch_name';-- Query tagSELECT * FROM db.table VERSION AS OF 'tag_name';-- List all referencesSELECT * FROM db.table.refs;
-- Set branch for writesSET spark.wap.branch = branch_name;INSERT INTO db.table VALUES (...);-- Or write directly to branch tableINSERT INTO db.table.branch_branch_name VALUES (...);
-- Fast-forward main to branch tip-- (only if main hasn't diverged)CALL catalog_name.system.fast_forward( table => 'db.table', branch => 'main', to => 'staging_branch');
-- Drop a tagALTER TABLE db.table DROP TAG tag_name;-- Drop a branch (and its snapshots if no longer referenced)ALTER TABLE db.table DROP BRANCH branch_name;