Evolution - Apache Iceberg Documentation

Iceberg supports in-place table evolution - you can evolve a table schema or change partition layout without costly data rewrites or table migrations. This is a fundamental advantage over traditional table formats.

Why Evolution Matters

Traditional table formats make schema and partition changes expensive:

Hive Example: Changing from daily to hourly partitions requires:

Creating a new table with the new partition scheme
Rewriting all historical data to the new table
Updating all queries to use the new table name
Managing the migration cutover period

Iceberg eliminates these costs through metadata-only evolution operations.

Schema Evolution

Iceberg supports comprehensive schema changes as metadata operations:

Supported Operations

Add

Add new columns to the table or nested structs

Drop

Remove existing columns from the table or nested structs

Rename

Rename existing columns or fields in nested structs

Update

Widen column types using safe type promotions

Reorder

Change the order of columns or struct fields

Default Values

Set initial and write defaults for fields (v3+)

Adding Columns

Add new columns anywhere in the schema:

// Add a top-level column
table.updateSchema()
  .addColumn("email", Types.StringType.get())
  .commit();

// Add a nested column in a struct
table.updateSchema()
  .addColumn("customer.phone", Types.StringType.get())
  .commit();

// Add with default value (v3+)
table.updateSchema()
  .addColumn("status", Types.StringType.get())
  .setDefault("status", "pending")
  .commit();

When a column is added:

It gets a new, unique field ID
Existing data files don’t contain the column
Reads return null (or the default value) for old files
New writes include the column

Dropping Columns

Remove columns from the current schema:

// Drop a column
table.updateSchema()
  .deleteColumn("deprecated_field")
  .commit();

// Drop a nested field
table.updateSchema()
  .deleteColumn("metadata.internal_id") 
  .commit();

When a column is dropped:

It’s removed from the current schema
The field ID is never reused
Old data files still contain the column (not rewritten)
Reads don’t return the column
The column can be added back with a new field ID

Renaming Columns

Change column names without affecting data:

// Rename a column  
table.updateSchema()
  .renameColumn("customer_name", "full_name")
  .commit();

// Rename nested field
table.updateSchema()
  .renameColumn("address.zipcode", "address.postal_code")
  .commit();

When a column is renamed:

The field ID stays the same
Data files are unchanged
Both old and new queries work immediately

Type Promotion

Widen column types using safe promotions:

// Promote int to long
table.updateSchema()
  .updateColumn("count", Types.LongType.get())
  .commit();

// Promote float to double  
table.updateSchema()
  .updateColumn("value", Types.DoubleType.get())
  .commit();

// Widen decimal precision
table.updateSchema()
  .updateColumn("price", Types.DecimalType.of(12, 2)) // was decimal(10,2)
  .commit();

Valid type promotions: | From | To | Notes | |------|----|----- -| | int | long | Safe - no data loss | | float | double | Safe - no precision loss | | decimal(P,S) | decimal(P’,S) | Only widen precision (P’ > P) | | date | timestamp, timestamp_ns | v3+ only |

Promotion from timestamp to timestamptz is not allowed as it changes semantic meaning.

Reordering Columns

Change the column order in query results:

// Move column to first position
table.updateSchema()
  .moveFirst("id")
  .commit();

// Move column after another  
table.updateSchema()
  .moveAfter("email", "name")
  .commit();

// Move column before another
table.updateSchema()
  .moveBefore("created_at", "updated_at")
  .commit();

Column order changes:

Affect the order in SELECT * queries
Don’t require data file rewrites
Don’t affect column identification (still by field ID)

Using Spark SQL

-- Add column
ALTER TABLE db.table ADD COLUMN email string;

-- Add column with comment
ALTER TABLE db.table ADD COLUMN phone string COMMENT 'Contact phone';

-- Drop column  
ALTER TABLE db.table DROP COLUMN deprecated_field;

-- Rename column
ALTER TABLE db.table RENAME COLUMN old_name TO new_name;

-- Change column type (promotion only)
ALTER TABLE db.table ALTER COLUMN count TYPE bigint;

-- Reorder columns
ALTER TABLE db.table ALTER COLUMN email AFTER name;
ALTER TABLE db.table ALTER COLUMN id FIRST;

Correctness Guarantees

Iceberg guarantees that schema evolution changes are independent and free of side-effects:

1. Added columns never read from other columns

Each column has a unique field ID. Adding column_b will never accidentally read data that was written as column_a.Why this matters: Formats that track columns by name can reuse a deleted column’s name, causing data corruption.

2. Dropping a column doesn't affect other columns

Removing column_a doesn’t change the values in column_b or any other column.Why this matters: Formats that track columns by position must shift all subsequent columns when one is deleted.

3. Updates don't affect other columns

Promoting count from int to long doesn’t change values in any other column.Why this matters: Column updates are isolated and predictable.

4. Reordering doesn't change values

Moving email before name doesn’t change which data belongs to which column.Why this matters: Field IDs, not positions, identify columns.

These guarantees are possible because Iceberg uses unique field IDs to track columns, not names or positions.

Partition Evolution

Iceberg allows changing partition layout without rewriting data:

// Update partition spec
table.updateSpec()
  .addField(bucket("id", 8))      // Add bucketing on id
  .removeField("category")         // Remove category partition
  .commit();

How It Works

When you evolve a partition spec:

Old data keeps its partition layout - Files written with the old spec are unchanged
New data uses the new layout - New writes use the updated partition spec
Metadata tracks both - Each partition spec has a unique ID
Split planning - Queries plan old and new layouts separately

Before Evolution:                After Evolution:

Partition Spec v1:               Partition Spec v1:
  months(timestamp)                months(timestamp)
                                   ↓
  2008-01/ (unchanged)             2008-01/ (unchanged)
  2008-02/ (unchanged)             2008-02/ (unchanged)
  ...
  2008-12/ (unchanged)             2008-12/ (unchanged)
                                   
                                 Partition Spec v2:  
                                   days(timestamp)
                                   ↓
                                   2009-01-01/ (new)
                                   2009-01-02/ (new)
                                   ...

Partition Evolution Example

A logs table starts with monthly partitions, then switches to daily:

-- Create table with monthly partitions  
CREATE TABLE logs (
  level string,
  message string,
  ts timestamp
) USING iceberg
PARTITIONED BY (months(ts));

-- Write data for 2023 (monthly partitions)
INSERT INTO logs VALUES ...;

-- Evolve to daily partitions for 2024 data
ALTER TABLE logs 
ADD PARTITION FIELD days(ts);

ALTER TABLE logs  
DROP PARTITION FIELD month_ts;

-- New data uses daily partitions
INSERT INTO logs VALUES ...;

-- Queries work across both layouts!
SELECT * FROM logs
WHERE ts BETWEEN '2023-12-15' AND '2024-01-15';
-- Prunes monthly partitions for 2023, daily partitions for 2024

Why Partition Evolution Works

Iceberg’s hidden partitioning makes evolution possible:

Queries filter on source columns (ts), not partition values
Iceberg derives appropriate partition filters for each spec
Users don’t need to know about partition layout changes
Old and new data coexist seamlessly

Using Spark SQL

-- Add a partition field
ALTER TABLE db.table 
ADD PARTITION FIELD bucket(16, user_id);

-- Add with custom name
ALTER TABLE db.table
ADD PARTITION FIELD bucket(8, id) AS id_bucket;

-- Remove a partition field (by name)
ALTER TABLE db.table  
DROP PARTITION FIELD category;

-- Replace partition spec (drops all existing fields)
ALTER TABLE db.table
REPLACE PARTITION FIELD days(ts) WITH hours(ts);

Using Java API

// Add partition fields
table.updateSpec()
  .addField(bucket("user_id", 16))
  .addField("user_bucket", bucket("user_id", 16)) // with custom name
  .commit();

// Remove partition field by name  
table.updateSpec()
  .removeField("category")
  .commit();

// Remove by transform
table.updateSpec()
  .removeField(bucket("id", 8))
  .commit();

Sort Order Evolution

Iceberg also supports evolving the sort order:

// Replace sort order
table.replaceSortOrder()
  .asc("id", NullOrder.NULLS_LAST)
  .desc("category", NullOrder.NULLS_FIRST)  
  .commit();

When sort order changes:

Old data keeps its original sort order
New data is written with the new sort order
Engines can choose whether to sort (or write unsorted if expensive)

-- Using Spark SQL
ALTER TABLE db.table 
WRITE ORDERED BY id, category DESC NULLS FIRST;

Evolution Best Practices

Test Schema Changes on Copies First

Create a table branch or copy to test evolution operations before applying to production.

table.manageSnapshots()
  .createBranch("test-schema", currentSnapshotId)
  .commit();

Don't Reuse Deleted Column Names

While technically allowed (with new field ID), reusing names can confuse users and queries.Better to use a new name: customer_email_v2 instead of reusing customer_email.

Plan Partition Changes Based on Data Growth

Monitor partition file sizes:

Too large (> 1GB) → Consider finer granularity (daily → hourly)
Too small (< 100MB) → Consider coarser granularity (hourly → daily)

Use Default Values for Required Fields

When adding required fields in v3, always set a default value:

table.updateSchema()
  .addRequiredColumn("version", Types.IntegerType.get())
  .setDefault("version", 1)
  .commit();

Document Major Schema Changes

Use table properties to track significant schema evolution:

table.updateProperties()
  .set("schema.change.2024-03-01", "Added user_id field for tracking")
  .commit();

Limitations and Constraints

Type Promotion Restrictions:

Cannot promote if a partition field uses the source column with an incompatible transform
Example: Cannot promote date to timestamp if bucket(date) is a partition field (hash would change)

Struct Evolution Restrictions:

Cannot convert a primitive to a struct or vice versa
Cannot move fields in/out of nested structs
Cannot change struct field IDs

Map Key Evolution:

Cannot add or drop struct fields in map keys (would change equality semantics)

Migration Scenarios

Migrating from Hive

Evolve Hive partition columns into hidden partitions:

-- Hive table with explicit partition column
-- event_date is both a partition and a column

-- After migration to Iceberg:  
ALTER TABLE events DROP COLUMN event_date;  -- Remove partition column
-- Partition values now derived from event_time
-- Queries on event_time automatically filter partitions

Changing Partition Granularity

Gracefully transition from coarse to fine granularity:

-- Start: Monthly partitions (2023 data)
-- Evolve: Add daily partitions
ALTER TABLE metrics ADD PARTITION FIELD days(ts);
ALTER TABLE metrics DROP PARTITION FIELD month_ts;

-- 2023 data: Monthly partitions (large files, coarse pruning)
-- 2024+ data: Daily partitions (right-sized files, fine pruning)  
-- Queries work seamlessly across both!

Learn More

Schemas

Understand Iceberg’s schema structure and field IDs

Partitioning

Learn about partition transforms and hidden partitioning

Branching

Use branches to test schema changes safely

Getting Started

Core Concepts

Table Operations

Query Engines

Catalogs & Storage

Advanced Features

Migration

Integrations

​Why Evolution Matters

​Schema Evolution

​Supported Operations

Add

Drop

Rename

Update

Reorder

Default Values

​Adding Columns

​Dropping Columns

​Renaming Columns

​Type Promotion

​Reordering Columns

​Using Spark SQL

​Correctness Guarantees

​Partition Evolution

​How It Works

​Partition Evolution Example

​Why Partition Evolution Works

​Using Spark SQL

​Using Java API

​Sort Order Evolution

​Evolution Best Practices

​Limitations and Constraints

​Migration Scenarios

​Migrating from Hive

​Changing Partition Granularity

​Learn More

Schemas

Partitioning

Branching

Build docs developers (and LLMs) love

Why Evolution Matters

Schema Evolution

Supported Operations

Adding Columns

Dropping Columns

Renaming Columns

Type Promotion

Reordering Columns

Using Spark SQL

Correctness Guarantees

Partition Evolution

How It Works

Partition Evolution Example

Why Partition Evolution Works

Using Spark SQL

Using Java API

Sort Order Evolution

Evolution Best Practices

Limitations and Constraints

Migration Scenarios

Migrating from Hive

Changing Partition Granularity

Learn More