Overview
Iceberg provides stored procedures for table maintenance and management. Procedures are available when using Iceberg SQL extensions.In Spark 4.0+, procedures are supported natively but are case-sensitive.
Using Procedures
Call procedures from any configured catalog using theCALL statement:
Argument Passing
Snapshot Management
rollback_to_snapshot
Roll back a table to a specific snapshot:table(required) - Table namesnapshot_id(required) - Target snapshot ID
rollback_to_timestamp
Roll back to a snapshot at a specific time:table(required) - Table nametimestamp(required) - Target timestamp
set_current_snapshot
Set the current snapshot (not limited to ancestors):cherrypick_snapshot
Apply changes from a snapshot without removing the original:Only append and dynamic overwrite snapshots can be cherry-picked.
fast_forward
Fast-forward a branch to another branch’s head:Metadata Management
expire_snapshots
Remove old snapshots and unreferenced data files:table(required) - Table nameolder_than- Expiration timestamp (default: 5 days ago)retain_last- Minimum snapshots to keep (default: 1)max_concurrent_deletes- Thread pool size for deletionsstream_results- Stream results to prevent driver OOMsnapshot_ids- Specific snapshot IDs to expire
deleted_data_files_countdeleted_position_delete_files_countdeleted_equality_delete_files_countdeleted_manifest_files_countdeleted_manifest_lists_count
remove_orphan_files
Remove files not referenced in table metadata:table(required) - Table nameolder_than- Remove files older than this (default: 3 days ago)location- Specific directory to scandry_run- Preview without deleting (default: false)max_concurrent_deletes- Thread pool sizestream_results- Stream results to prevent OOM
rewrite_data_files
Compact small files and optimize data layout:target-file-size-bytes- Target output file size (default: 512 MB)min-file-size-bytes- Files below this are rewritten (default: 75% of target)max-file-size-bytes- Files above this are rewritten (default: 180% of target)min-input-files- Minimum files to trigger rewrite (default: 5)rewrite-all- Force rewrite all files (default: false)remove-dangling-deletes- Remove orphaned delete files (default: false)
rewrite_manifests
Optimize manifest files for better scan planning:rewrite_position_delete_files
Compact position delete files and remove dangling deletes:Table Migration
snapshot
Create a lightweight copy for testing:Snapshot tables share data files with the source table. Use
DROP TABLE to clean up when done testing.migrate
Replace a Hive/Spark table with an Iceberg table:table(required) - Table to migrateproperties- Properties for the new Iceberg tabledrop_backup- Don’t retain original table (default: false)backup_table_name- Custom backup name (default:table_BACKUP_)
add_files
Add files from external sources:register_table
Register an existing metadata file in a catalog:Change Data Capture
create_changelog_view
Create a view showing table changes:_change_type- INSERT, DELETE, UPDATE_BEFORE, UPDATE_AFTER_change_ordinal- Order of changes_commit_snapshot_id- Snapshot where change occurred
Table Statistics
compute_table_stats
Calculate NDV statistics for columns:compute_partition_stats
Compute partition statistics incrementally:Metadata Information
ancestors_of
Report snapshot ancestry:Best Practices
Regular Maintenance
Regular Maintenance
Run maintenance procedures on a schedule:
- Daily:
expire_snapshotsfor active tables - Weekly:
rewrite_data_filesfor frequently updated tables - Monthly:
remove_orphan_filesfor all tables
Streaming Tables
Streaming Tables
For tables with streaming writes:
- Use longer trigger intervals (1+ minutes)
- Regularly run
rewrite_data_filesto compact small files - Run
rewrite_manifeststo optimize metadata
Safe Orphan Removal
Safe Orphan Removal
Always use dry run first:
Next Steps
Writes
Learn about write operations and distribution
Configuration
Configure Spark for optimal performance
Queries
Query tables and inspect metadata
Structured Streaming
Maintain streaming tables