Why Evolution Matters
Traditional table formats make schema and partition changes expensive: Iceberg eliminates these costs through metadata-only evolution operations.Schema Evolution
Iceberg supports comprehensive schema changes as metadata operations:Supported Operations
Add
Add new columns to the table or nested structs
Drop
Remove existing columns from the table or nested structs
Rename
Rename existing columns or fields in nested structs
Update
Widen column types using safe type promotions
Reorder
Change the order of columns or struct fields
Default Values
Set initial and write defaults for fields (v3+)
Adding Columns
Add new columns anywhere in the schema:- It gets a new, unique field ID
- Existing data files don’t contain the column
- Reads return
null(or the default value) for old files - New writes include the column
Dropping Columns
Remove columns from the current schema:- It’s removed from the current schema
- The field ID is never reused
- Old data files still contain the column (not rewritten)
- Reads don’t return the column
- The column can be added back with a new field ID
Renaming Columns
Change column names without affecting data:- The field ID stays the same
- Data files are unchanged
- Both old and new queries work immediately
Type Promotion
Widen column types using safe promotions:Promotion from
timestamp to timestamptz is not allowed as it changes semantic meaning.Reordering Columns
Change the column order in query results:- Affect the order in SELECT * queries
- Don’t require data file rewrites
- Don’t affect column identification (still by field ID)
Using Spark SQL
Correctness Guarantees
Iceberg guarantees that schema evolution changes are independent and free of side-effects:1. Added columns never read from other columns
1. Added columns never read from other columns
Each column has a unique field ID. Adding
column_b will never accidentally read data that was written as column_a.Why this matters: Formats that track columns by name can reuse a deleted column’s name, causing data corruption.2. Dropping a column doesn't affect other columns
2. Dropping a column doesn't affect other columns
Removing
column_a doesn’t change the values in column_b or any other column.Why this matters: Formats that track columns by position must shift all subsequent columns when one is deleted.3. Updates don't affect other columns
3. Updates don't affect other columns
Promoting
count from int to long doesn’t change values in any other column.Why this matters: Column updates are isolated and predictable.4. Reordering doesn't change values
4. Reordering doesn't change values
Moving
email before name doesn’t change which data belongs to which column.Why this matters: Field IDs, not positions, identify columns.Partition Evolution
Iceberg allows changing partition layout without rewriting data:How It Works
When you evolve a partition spec:- Old data keeps its partition layout - Files written with the old spec are unchanged
- New data uses the new layout - New writes use the updated partition spec
- Metadata tracks both - Each partition spec has a unique ID
- Split planning - Queries plan old and new layouts separately
Partition Evolution Example
A logs table starts with monthly partitions, then switches to daily:Why Partition Evolution Works
Iceberg’s hidden partitioning makes evolution possible:- Queries filter on source columns (
ts), not partition values - Iceberg derives appropriate partition filters for each spec
- Users don’t need to know about partition layout changes
- Old and new data coexist seamlessly
Using Spark SQL
Using Java API
Sort Order Evolution
Iceberg also supports evolving the sort order:- Old data keeps its original sort order
- New data is written with the new sort order
- Engines can choose whether to sort (or write unsorted if expensive)
Evolution Best Practices
Test Schema Changes on Copies First
Test Schema Changes on Copies First
Create a table branch or copy to test evolution operations before applying to production.
Don't Reuse Deleted Column Names
Don't Reuse Deleted Column Names
While technically allowed (with new field ID), reusing names can confuse users and queries.Better to use a new name:
customer_email_v2 instead of reusing customer_email.Plan Partition Changes Based on Data Growth
Plan Partition Changes Based on Data Growth
Monitor partition file sizes:
- Too large (> 1GB) → Consider finer granularity (daily → hourly)
- Too small (< 100MB) → Consider coarser granularity (hourly → daily)
Use Default Values for Required Fields
Use Default Values for Required Fields
When adding required fields in v3, always set a default value:
Document Major Schema Changes
Document Major Schema Changes
Use table properties to track significant schema evolution:
Limitations and Constraints
Migration Scenarios
Migrating from Hive
Evolve Hive partition columns into hidden partitions:Changing Partition Granularity
Gracefully transition from coarse to fine granularity:Learn More
Schemas
Understand Iceberg’s schema structure and field IDs
Partitioning
Learn about partition transforms and hidden partitioning
Branching
Use branches to test schema changes safely