Skip to main content

Overview

Spark is currently the most feature-rich compute engine for Iceberg operations. Apache Iceberg uses Spark’s DataSourceV2 API for data source and catalog implementations, providing comprehensive support for table management, queries, and writes.

Key Features

Full DDL Support

Create, alter, and manage Iceberg tables with complete SQL DDL operations

Advanced Queries

Time travel, metadata tables, and efficient scan planning

Row-Level Operations

MERGE INTO, UPDATE, and DELETE operations for data modification

Streaming Support

Structured Streaming reads and writes with incremental processing

Compatibility

Iceberg integrates with Apache Spark through the DataSourceV2 API, with different levels of support across Spark versions:
FeatureAvailabilityNotes
SQL INSERT INTO✔️ All versionsRequires ANSI assignment policy (default since Spark 3.0)
SQL MERGE INTO✔️ All versionsRequires Iceberg Spark extensions
SQL DELETE FROM✔️ All versionsRow-level deletes require extensions
SQL UPDATE✔️ All versionsRequires Iceberg Spark extensions
DataFrame writes✔️ All versionsDataFrameWriterV2 API recommended
Structured Streaming✔️ All versionsAppend and complete modes

Type Compatibility

Iceberg automatically converts between Spark and Iceberg types:

Spark to Iceberg Type Mapping

Spark TypeIceberg TypeNotes
booleanboolean
byte, short, integerintegerPromotion supported
longlong
floatfloat
doubledouble
decimaldecimal
datedate
timestamptimestamp with timezone
timestamp_ntztimestamp without timezone
string, char, varcharstring
binarybinaryCan write to fixed type with length assertion
structstruct
arraylist
mapmap

Iceberg to Spark Type Mapping

Iceberg TypeSpark TypeSupported
booleanboolean✔️
integerinteger✔️
longlong✔️
floatfloat✔️
doubledouble✔️
decimaldecimal✔️
datedate✔️
time-❌ Not supported
timestamp with timezonetimestamp✔️
timestamp without timezonetimestamp_ntz✔️
stringstring✔️
uuidstring✔️
fixedbinary✔️
binarybinary✔️
structstruct✔️
listarray✔️
mapmap✔️
variantvariant✔️ (Spark 4.0+)
unknownnull✔️ (Spark 4.0+)

Getting Started

1

Add Iceberg Runtime

Include the Iceberg Spark runtime in your Spark environment:
spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5:{{ icebergVersion }}
2

Configure Catalogs

Set up Iceberg catalogs in your Spark configuration:
spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.my_catalog.type=hive
3

Enable SQL Extensions

Add Iceberg SQL extensions for advanced features:
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

Next Steps

Getting Started

Set up your first Iceberg table with Spark

DDL Operations

Learn about CREATE, ALTER, and DROP commands

Query Data

Execute queries and explore metadata tables

Write Data

Insert, update, and merge data into tables

Build docs developers (and LLMs) love