Quick Start Guide

This guide will help you get started with Apache Iceberg quickly. You’ll learn how to add Iceberg to your project, create your first table, and perform basic operations.

The latest version of Iceberg can be found on the releases page. This guide uses examples compatible with Iceberg 1.0+.

Installation

Add Dependencies

Add Iceberg to your project using Maven or Gradle.

<dependencies>
  <!-- Core Iceberg API -->
  <dependency>
    <groupId>org.apache.iceberg</groupId>
    <artifactId>iceberg-core</artifactId>
    <version>1.7.1</version>
  </dependency>
  
  <!-- For Parquet file format -->
  <dependency>
    <groupId>org.apache.iceberg</groupId>
    <artifactId>iceberg-parquet</artifactId>
    <version>1.7.1</version>
  </dependency>
  
  <!-- For Hive Metastore catalog -->
  <dependency>
    <groupId>org.apache.iceberg</groupId>
    <artifactId>iceberg-hive-metastore</artifactId>
    <version>1.7.1</version>
  </dependency>
</dependencies>

Module guide:

iceberg-core - The core API and implementations (required)
iceberg-parquet - For Parquet file format support
iceberg-orc - For ORC file format support
iceberg-hive-metastore - For Hive Metastore catalog
iceberg-data - For direct JVM data access

Choose a Catalog

Iceberg uses catalogs to manage tables. Choose the catalog that fits your environment:

Hadoop Catalog - File-based catalog for HDFS or S3
Hive Metastore - Uses existing Hive Metastore
AWS Glue - For AWS environments
Nessie - Git-like data catalog
REST Catalog - HTTP-based catalog service

Create Your First Table

Follow the examples below to create a table.

Using Spark (Recommended for Getting Started)

Spark is the most feature-rich engine for Iceberg and the easiest way to get started.

Start Spark Shell with Iceberg

Launch Spark with the Iceberg runtime package:

spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5:1.7.1

Or for spark-sql:

spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5:1.7.1 \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop \
    --conf spark.sql.catalog.local.warehouse=$PWD/warehouse

Replace 3.5 with your Spark version (e.g., 3.3, 3.4, 3.5).

Create a Table

Create your first Iceberg table using SQL:

CREATE TABLE local.db.users (
    id bigint,
    name string,
    email string,
    created_at timestamp
) USING iceberg;

Or create a partitioned table:

CREATE TABLE local.db.events (
    event_id bigint,
    user_id bigint,
    event_type string,
    event_time timestamp,
    payload string
) USING iceberg
PARTITIONED BY (days(event_time), event_type);

Insert Data

Insert data using standard SQL:

INSERT INTO local.db.users 
VALUES 
    (1, 'Alice', '[email protected]', current_timestamp()),
    (2, 'Bob', '[email protected]', current_timestamp()),
    (3, 'Charlie', '[email protected]', current_timestamp());

Or insert from another table:

INSERT INTO local.db.users 
SELECT id, name, email, timestamp 
FROM source_table 
WHERE active = true;

Query Data

Query your Iceberg table:

SELECT * FROM local.db.users WHERE name LIKE 'A%';

Use time travel to query historical data:

-- Query as of a timestamp
SELECT * FROM local.db.users 
TIMESTAMP AS OF '2024-01-01 10:00:00';

-- Query a specific snapshot
SELECT * FROM local.db.users 
VERSION AS OF 5678901234;

View table history:

SELECT * FROM local.db.users.snapshots;

Update and Merge Data

Iceberg supports row-level updates and merges:

-- Update rows
UPDATE local.db.users 
SET email = '[email protected]' 
WHERE id = 1;

-- Delete rows
DELETE FROM local.db.users 
WHERE created_at < '2023-01-01';

-- Merge (upsert) data
MERGE INTO local.db.users t
USING updates u ON t.id = u.id
WHEN MATCHED THEN 
  UPDATE SET t.email = u.email, t.name = u.name
WHEN NOT MATCHED THEN 
  INSERT *;

Using the Java API

For programmatic access, use the Iceberg Java API.

Initialize a Catalog

Choose and initialize a catalog:

import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.hadoop.HadoopCatalog;

Configuration conf = new Configuration();
String warehousePath = "hdfs://host:8020/warehouse";
HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath);

Define a Schema

Create a schema for your table:

import org.apache.iceberg.Schema;
import org.apache.iceberg.types.Types;

Schema schema = new Schema(
    Types.NestedField.required(1, "id", Types.LongType.get()),
    Types.NestedField.required(2, "name", Types.StringType.get()),
    Types.NestedField.optional(3, "email", Types.StringType.get()),
    Types.NestedField.required(4, "created_at", 
        Types.TimestampType.withZone())
);

Type IDs must be unique within the schema. Iceberg automatically reassigns IDs when creating tables to ensure uniqueness.

Define Partitioning

Create a partition spec:

import org.apache.iceberg.PartitionSpec;

// Unpartitioned table
PartitionSpec spec = PartitionSpec.unpartitioned();

// Or partition by day and identity
PartitionSpec spec = PartitionSpec.builderFor(schema)
    .day("created_at")
    .identity("email")
    .build();

Partition transforms include: identity, bucket[N], truncate[L], year, month, day, and hour.

Create the Table

Create the table using the catalog:

import org.apache.iceberg.Table;
import org.apache.iceberg.catalog.TableIdentifier;

TableIdentifier name = TableIdentifier.of("db", "users");
Table table = catalog.createTable(name, schema, spec);

System.out.println("Created table: " + table.location());

Write Data

Append data files to the table:

import org.apache.iceberg.DataFile;
import org.apache.iceberg.DataFiles;

// Create a data file (simplified example)
DataFile dataFile = DataFiles.builder(spec)
    .withPath("/path/to/data-file.parquet")
    .withFileSizeInBytes(1024)
    .withRecordCount(100)
    .build();

// Append to table
table.newAppend()
    .appendFile(dataFile)
    .commit();

This is a simplified example. In practice, you would write data using a file writer or compute engine like Spark.

Read Data

Scan and read table data:

import org.apache.iceberg.TableScan;
import org.apache.iceberg.io.CloseableIterable;
import org.apache.iceberg.expressions.Expressions;
import org.apache.iceberg.FileScanTask;

// Create a scan
TableScan scan = table.newScan()
    .filter(Expressions.greaterThan("id", 100))
    .select("id", "name", "email");

// Get the files to read
try (CloseableIterable<FileScanTask> tasks = scan.planFiles()) {
    for (FileScanTask task : tasks) {
        System.out.println("File: " + task.file().path());
        System.out.println("Records: " + task.file().recordCount());
    }
}

Common Operations

Schema Evolution

Modify your table schema without rewriting data:

// Add a new column
table.updateSchema()
    .addColumn("phone", Types.StringType.get())
    .commit();

// Rename a column
table.updateSchema()
    .renameColumn("email", "email_address")
    .commit();

// Update column type (with compatible type)
table.updateSchema()
    .updateColumn("id", Types.LongType.get())
    .commit();

Time Travel

Access historical versions of your table:

// Read from a specific snapshot
TableScan scan = table.newScan()
    .useSnapshot(snapshotId);

// Read as of a timestamp
TableScan scan = table.newScan()
    .asOfTime(System.currentTimeMillis() - 3600000); // 1 hour ago

Table Maintenance

Keep your tables healthy:

-- Expire old snapshots (remove history)
CALL local.system.expire_snapshots(
    table => 'db.users',
    older_than => TIMESTAMP '2024-01-01 00:00:00'
);

-- Remove orphan files
CALL local.system.remove_orphan_files(
    table => 'db.users'
);

-- Compact small files
CALL local.system.rewrite_data_files(
    table => 'db.users'
);

Next Steps

Now that you’ve created your first Iceberg table, explore more advanced features:

Java API Deep Dive

Learn advanced Java API usage

Partitioning

Master hidden partitioning

Schema Evolution

Safely evolve table schemas

Performance

Optimize table performance

Troubleshooting

ClassNotFoundException or NoClassDefFoundError

Make sure you have all required dependencies:

iceberg-core for the core API
iceberg-parquet or iceberg-orc for file formats
iceberg-hive-metastore for Hive catalog
Hadoop dependencies for HDFS access

For Spark, use the runtime JAR which includes all dependencies:

--packages org.apache.iceberg:iceberg-spark-runtime-3.5:1.7.1

Connection refused to Hive Metastore

Check that:

Hive Metastore is running: netstat -an | grep 9083
The URI is correct: thrift://localhost:9083
Network firewall allows connections
Hive configuration is in the classpath

Table not found

Verify:

The catalog name is correct
The database/namespace exists
You have permissions to access the table
The table was created successfully

List tables to debug:

List<TableIdentifier> tables = catalog.listTables(Namespace.of("db"));
tables.forEach(System.out::println);

For more help, check the Apache Iceberg documentation or join the community Slack.

Getting Started

Core Concepts

Table Operations

Query Engines

Catalogs & Storage

Advanced Features

Migration

Integrations

Quick Start Guide

Quick Start Guide

Installation

Using Spark (Recommended for Getting Started)

Using the Java API

Common Operations

Schema Evolution

Time Travel

Table Maintenance

Next Steps

Java API Deep Dive

Partitioning

Schema Evolution

Performance

Troubleshooting

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Table Operations

Query Engines

Catalogs & Storage

Advanced Features

Migration

Integrations

​Quick Start Guide

​Installation

​Using Spark (Recommended for Getting Started)

​Using the Java API

​Common Operations

​Schema Evolution

​Time Travel

​Table Maintenance

​Next Steps

Java API Deep Dive

Partitioning

Schema Evolution

Performance

​Troubleshooting

Build docs developers (and LLMs) love

Quick Start Guide

Installation

Using Spark (Recommended for Getting Started)

Using the Java API

Common Operations

Schema Evolution

Time Travel

Table Maintenance

Next Steps

Troubleshooting