Skip to main content

Apache Iceberg

The open table format for analytic datasets.

Quick Start

Get up and running with Apache Iceberg in minutes

1

Add Iceberg to your project

Add the Iceberg dependency to your build configuration:
Maven
<dependency>
  <groupId>org.apache.iceberg</groupId>
  <artifactId>iceberg-core</artifactId>
  <version>1.7.1</version>
</dependency>
Gradle
dependencies {
  implementation 'org.apache.iceberg:iceberg-core:1.7.1'
}
2

Create a catalog

Initialize a catalog to manage your tables:
import org.apache.iceberg.catalog.Catalog;
import org.apache.hadoop.conf.Configuration;

Configuration conf = new Configuration();
Catalog catalog = new HadoopCatalog(conf, "warehouse/path");
3

Create your first table

Define a schema and create an Iceberg table:
import org.apache.iceberg.*;
import org.apache.iceberg.types.Types;

Schema schema = new Schema(
  Types.NestedField.required(1, "id", Types.LongType.get()),
  Types.NestedField.required(2, "data", Types.StringType.get()),
  Types.NestedField.required(3, "timestamp", Types.TimestampType.withZone())
);

PartitionSpec spec = PartitionSpec.builderFor(schema)
  .day("timestamp")
  .build();

Table table = catalog.createTable(
  TableIdentifier.of("my_db", "my_table"),
  schema,
  spec
);
Iceberg tables support schema evolution, partitioning evolution, and hidden partitioning to prevent user errors.
4

Query with your favorite engine

Use Iceberg tables with Spark, Flink, Trino, or other query engines:
Spark SQL
-- Read from Iceberg table
SELECT * FROM my_db.my_table
WHERE timestamp >= current_date();

-- Time travel to previous snapshot
SELECT * FROM my_db.my_table
VERSION AS OF 12345678901234567890;
See the query engines guide for detailed integration instructions.

Why Iceberg?

Built for reliability, performance, and correctness at massive scale

ACID Transactions

Serializable isolation ensures readers never see partial or uncommitted changes. Multiple concurrent writers use optimistic concurrency.

Time Travel

Query historical table snapshots for reproducible analysis and easy version rollback when issues occur.

Schema Evolution

Add, drop, rename, or update columns without side effects. No accidental data deletion or corruption.

Hidden Partitioning

Users don’t need to know about partitioning to get fast queries. Partition layout can evolve as data patterns change.

Advanced Filtering

Data files are pruned using partition and column-level statistics from table metadata. Scan planning is fast.

Multi-Engine Support

Works seamlessly with Spark, Flink, Trino, Presto, Hive, and Impala. All engines can safely work with the same tables.

Explore by Topic

Dive deep into Iceberg’s capabilities

Core Concepts

Understand the Iceberg table format, schemas, partitioning, and reliability guarantees.

Table Operations

Learn how to read, write, and maintain Iceberg tables efficiently.

Query Engines

Integrate Iceberg with Spark, Flink, Hive, and other processing engines.

Catalogs

Configure and customize catalogs for table metadata management.

REST API

Use the REST Catalog API for programmatic table management.

Migration

Migrate existing tables from Hive or Delta Lake to Iceberg.

Ready to Get Started?

Follow our quickstart guide to create your first Iceberg table, or explore the API reference to learn about all available features.