Migration Approaches
There are two methods for executing table migration: full data migration and in-place metadata migration.Full Data Migration
Copies all data files from the source table to the new Iceberg table. Makes the new table fully isolated but is slower and doubles the space.
In-Place Metadata Migration
Preserves existing data files while incorporating Iceberg metadata on top of them. Faster and eliminates data duplication but tables are not fully isolated.
Full Data Migration
Full data migration involves copying all data files from the source table to the new Iceberg table. This method makes the new table fully isolated from the source table, but is slower and doubles the space. In practice, users can use operations like:- Create-Table-As-Select (CTAS)
- INSERT statements
- Change-Data-Capture pipelines
In-Place Metadata Migration
In-place metadata migration preserves the existing data files while incorporating Iceberg metadata on top of them. This method is not only faster but also eliminates the need for data duplication. Apache Iceberg supports the in-place metadata migration approach, which includes three important actions:Snapshot Table
The Snapshot Table action creates a new Iceberg table with a different name and with the same schema and partitioning as the source table, leaving the source table unchanged during and after the action.Create new Iceberg table
Create a new Iceberg table with the same metadata (schema, partition spec, etc.) as the source table and a different name. Readers and Writers on the source table can continue to work.
Commit data files
Commit all data files across all partitions to the new Iceberg table. The source table remains unchanged. Readers can be switched to the new Iceberg table.
Migrate Table
The Migrate Table action also creates a new Iceberg table with the same schema and partitioning as the source table. However, during the action execution, it locks and drops the source table from the catalog.Stop all writers
Stop all writers interacting with the source table. Readers that also support Iceberg may continue reading.
Create new table and rename source
Create a new Iceberg table with the same identifier and metadata (schema, partition spec, etc.) as the source table. Rename the source table for a backup in case of failure and rollback.
Add Files
After the initial step (either Snapshot Table or Migrate Table), it is common to find some data files that have not been migrated. These files often originate from concurrent writers who continue writing to the source table during or after the migration process. In practice, these files can be:- New data files in Hive tables
- New snapshots (versions) of Delta Lake tables
The Add Files action is essential for incorporating these files into the Iceberg table.
Migration by Table Format
Iceberg supports migration from different table formats:Hive Migration
Migrate from Apache Hive tables to Iceberg format
Delta Lake Migration
Migrate from Delta Lake tables to Iceberg format