Overview
To query Iceberg tables in Spark, first configure Spark catalogs . Iceberg uses Apache Spark’s DataSourceV2 API for data source implementations.
Querying with SQL
In Spark, tables use identifiers that include a catalog name:
SELECT * FROM prod . db . table ;
-- catalog: prod, namespace: db, table: table
Metadata tables use the Iceberg table name as a namespace:
SELECT * FROM prod . db . table . files ;
SELECT * FROM prod . db . table . snapshots ;
SELECT * FROM prod . db . table . history ;
Time Travel
Spark supports time travel queries using TIMESTAMP AS OF or VERSION AS OF clauses:
Timestamp Travel
Snapshot Travel
Branch/Tag Travel
-- Time travel to specific timestamp
SELECT * FROM prod . db . table
TIMESTAMP AS OF '1986-10-26 01:21:00' ;
-- Using FOR SYSTEM_TIME AS OF
SELECT * FROM prod . db . table
FOR SYSTEM_TIME AS OF '1986-10-26 01:21:00' ;
-- Using Unix timestamp (in seconds)
SELECT * FROM prod . db . table
TIMESTAMP AS OF 499162860 ;
If a branch/tag name matches a snapshot ID, the snapshot ID takes precedence. Rename branches/tags with a prefix like snapshot-1 to avoid conflicts.
Schema Selection in Time Travel
Different time travel queries use different schemas:
Query Type Schema Used Example Timestamp Snapshot’s schema TIMESTAMP AS OF '2025-01-01 10:00:00'Snapshot ID Snapshot’s schema VERSION AS OF 101Branch Table’s current schema VERSION AS OF 'audit-branch'Tag Snapshot’s schema VERSION AS OF 'historical-tag'
Branches use the table’s current schema, while tags use the snapshot’s schema at the time the tag was created.
Querying with DataFrames
Load tables as DataFrames using spark.table:
val df = spark.table( "prod.db.table" )
Loading with Paths
You can load tables using different identifier formats:
// File path (HadoopTable)
spark.read.format( "iceberg" ).load( "file:///path/to/table" )
// Table name in current catalog
spark.read.format( "iceberg" ).load( "tablename" )
// Catalog and table
spark.read.format( "iceberg" ).load( "catalog.tablename" )
// Full path
spark.read.format( "iceberg" ).load( "catalog.namespace.tablename" )
Time Travel with DataFrames
Use read options for time travel:
Timestamp
Snapshot ID
Branch
Tag
spark.read
.option( "as-of-timestamp" , "499162860000" )
.format( "iceberg" )
.load( "path/to/table" )
Incremental Read
Read changes between snapshots:
// Get data added after start snapshot until end snapshot
spark.read
.format( "iceberg" )
.option( "start-snapshot-id" , "10963874102873" )
.option( "end-snapshot-id" , "63874143573109" )
.load( "path/to/table" )
Incremental reads only work with append operations. They cannot handle replace, overwrite, or delete operations. Incremental read is not supported in SQL syntax.
Inspecting Tables
Iceberg provides metadata tables to inspect table state.
History
View table history with snapshot lineage:
SELECT * FROM prod . db . table . history ;
+-------------------------+---------------------+-----------+---------------------+
| made_current_at | snapshot_id | parent_id | is_current_ancestor |
+-------------------------+---------------------+-----------+---------------------+
| 2019-02-08 03:29:51.215 | 5781947118336215154 | NULL | true |
| 2019-02-08 03:47:55.948 | 5179299526185056830 | ... | true |
| 2019-02-09 16:24:30.13 | 296410040247533544 | ... | false |
+-------------------------+---------------------+-----------+---------------------+
Rows with is_current_ancestor = false indicate rolled-back commits.
Snapshots
View all snapshots in the table:
SELECT * FROM prod . db . table . snapshots ;
SELECT
h . made_current_at ,
s . operation ,
h . snapshot_id ,
h . is_current_ancestor ,
s . summary ['spark.app.id']
FROM prod . db . table . history h
JOIN prod . db . table . snapshots s
ON h . snapshot_id = s . snapshot_id
ORDER BY made_current_at;
Files
View current data and delete files:
Current Files
Data Files Only
Delete Files Only
All Files (All Snapshots)
SELECT * FROM prod . db . table . files ;
Content Types:
0 - Data files
1 - Position delete files
2 - Equality delete files
Manifests
View manifest files:
SELECT * FROM prod . db . table . manifests ;
Partitions
View current partitions with statistics:
SELECT * FROM prod . db . table . partitions ;
+----------------+---------+--------------+------------+-------------------------------+
| partition | spec_id | record_count | file_count | total_data_file_size_in_bytes |
+----------------+---------+--------------+------------+-------------------------------+
| {20211001, 11} | 0 | 1 | 1 | 100 |
| {20211002, 11} | 0 | 4 | 3 | 500 |
+----------------+---------+--------------+------------+-------------------------------+
The partitions table shows partitions with data or delete files in the current snapshot. Delete files are not applied, so some partitions may appear even if all rows are deleted.
Entries
View manifest entries with file metadata:
SELECT * FROM prod . db . table . entries ;
Position Delete Files
View position delete records:
SELECT * FROM prod . db . table . position_deletes ;
References
View branches and tags:
SELECT * FROM prod . db . table . refs ;
+------+--------+---------------------+-------------------------+
| name | type | snapshot_id | max_reference_age_in_ms |
+------+--------+---------------------+-------------------------+
| main | BRANCH | 4686954189838128572 | 10 |
| tag1 | TAG | 4686954189838128572 | 10 |
+------+--------+---------------------+-------------------------+
View metadata file history:
SELECT * FROM prod . db . table . metadata_log_entries ;
Load metadata tables using the DataFrame API:
spark.read.format( "iceberg" ).load( "db.table.files" )
Inspect metadata at specific points in time:
Timestamp
Snapshot ID
DataFrame API
SELECT * FROM prod . db . table . manifests
TIMESTAMP AS OF '2021-09-20 08:00:00' ;
Metadata tables prefixed with all_ show data across all snapshots :
The “all” metadata tables may produce more than one row per file because files can be part of multiple snapshots.
all_data_files All data files across all snapshots
all_delete_files All delete files across all snapshots
all_entries All manifest entries across all snapshots
all_manifests All manifest files across all snapshots
Next Steps
Write Data Learn about INSERT, MERGE, and UPDATE operations
Procedures Maintain tables with stored procedures
Configuration Configure read options and performance tuning
Structured Streaming Stream data with Spark Structured Streaming