Skip to main content

Quick Start Guide

This guide will help you get started with Apache Iceberg quickly. You’ll learn how to add Iceberg to your project, create your first table, and perform basic operations.
The latest version of Iceberg can be found on the releases page. This guide uses examples compatible with Iceberg 1.0+.

Installation

1

Add Dependencies

Add Iceberg to your project using Maven or Gradle.
<dependencies>
  <!-- Core Iceberg API -->
  <dependency>
    <groupId>org.apache.iceberg</groupId>
    <artifactId>iceberg-core</artifactId>
    <version>1.7.1</version>
  </dependency>
  
  <!-- For Parquet file format -->
  <dependency>
    <groupId>org.apache.iceberg</groupId>
    <artifactId>iceberg-parquet</artifactId>
    <version>1.7.1</version>
  </dependency>
  
  <!-- For Hive Metastore catalog -->
  <dependency>
    <groupId>org.apache.iceberg</groupId>
    <artifactId>iceberg-hive-metastore</artifactId>
    <version>1.7.1</version>
  </dependency>
</dependencies>
Module guide:
  • iceberg-core - The core API and implementations (required)
  • iceberg-parquet - For Parquet file format support
  • iceberg-orc - For ORC file format support
  • iceberg-hive-metastore - For Hive Metastore catalog
  • iceberg-data - For direct JVM data access
2

Choose a Catalog

Iceberg uses catalogs to manage tables. Choose the catalog that fits your environment:
  • Hadoop Catalog - File-based catalog for HDFS or S3
  • Hive Metastore - Uses existing Hive Metastore
  • AWS Glue - For AWS environments
  • Nessie - Git-like data catalog
  • REST Catalog - HTTP-based catalog service
3

Create Your First Table

Follow the examples below to create a table.
Spark is the most feature-rich engine for Iceberg and the easiest way to get started.
1

Start Spark Shell with Iceberg

Launch Spark with the Iceberg runtime package:
spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5:1.7.1
Or for spark-sql:
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5:1.7.1 \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop \
    --conf spark.sql.catalog.local.warehouse=$PWD/warehouse
Replace 3.5 with your Spark version (e.g., 3.3, 3.4, 3.5).
2

Create a Table

Create your first Iceberg table using SQL:
CREATE TABLE local.db.users (
    id bigint,
    name string,
    email string,
    created_at timestamp
) USING iceberg;
Or create a partitioned table:
CREATE TABLE local.db.events (
    event_id bigint,
    user_id bigint,
    event_type string,
    event_time timestamp,
    payload string
) USING iceberg
PARTITIONED BY (days(event_time), event_type);
3

Insert Data

Insert data using standard SQL:
INSERT INTO local.db.users 
VALUES 
    (1, 'Alice', '[email protected]', current_timestamp()),
    (2, 'Bob', '[email protected]', current_timestamp()),
    (3, 'Charlie', '[email protected]', current_timestamp());
Or insert from another table:
INSERT INTO local.db.users 
SELECT id, name, email, timestamp 
FROM source_table 
WHERE active = true;
4

Query Data

Query your Iceberg table:
SELECT * FROM local.db.users WHERE name LIKE 'A%';
Use time travel to query historical data:
-- Query as of a timestamp
SELECT * FROM local.db.users 
TIMESTAMP AS OF '2024-01-01 10:00:00';

-- Query a specific snapshot
SELECT * FROM local.db.users 
VERSION AS OF 5678901234;
View table history:
SELECT * FROM local.db.users.snapshots;
5

Update and Merge Data

Iceberg supports row-level updates and merges:
-- Update rows
UPDATE local.db.users 
SET email = '[email protected]' 
WHERE id = 1;

-- Delete rows
DELETE FROM local.db.users 
WHERE created_at < '2023-01-01';

-- Merge (upsert) data
MERGE INTO local.db.users t
USING updates u ON t.id = u.id
WHEN MATCHED THEN 
  UPDATE SET t.email = u.email, t.name = u.name
WHEN NOT MATCHED THEN 
  INSERT *;

Using the Java API

For programmatic access, use the Iceberg Java API.
1

Initialize a Catalog

Choose and initialize a catalog:
import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.hadoop.HadoopCatalog;

Configuration conf = new Configuration();
String warehousePath = "hdfs://host:8020/warehouse";
HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath);
2

Define a Schema

Create a schema for your table:
import org.apache.iceberg.Schema;
import org.apache.iceberg.types.Types;

Schema schema = new Schema(
    Types.NestedField.required(1, "id", Types.LongType.get()),
    Types.NestedField.required(2, "name", Types.StringType.get()),
    Types.NestedField.optional(3, "email", Types.StringType.get()),
    Types.NestedField.required(4, "created_at", 
        Types.TimestampType.withZone())
);
Type IDs must be unique within the schema. Iceberg automatically reassigns IDs when creating tables to ensure uniqueness.
3

Define Partitioning

Create a partition spec:
import org.apache.iceberg.PartitionSpec;

// Unpartitioned table
PartitionSpec spec = PartitionSpec.unpartitioned();

// Or partition by day and identity
PartitionSpec spec = PartitionSpec.builderFor(schema)
    .day("created_at")
    .identity("email")
    .build();
Partition transforms include: identity, bucket[N], truncate[L], year, month, day, and hour.
4

Create the Table

Create the table using the catalog:
import org.apache.iceberg.Table;
import org.apache.iceberg.catalog.TableIdentifier;

TableIdentifier name = TableIdentifier.of("db", "users");
Table table = catalog.createTable(name, schema, spec);

System.out.println("Created table: " + table.location());
5

Write Data

Append data files to the table:
import org.apache.iceberg.DataFile;
import org.apache.iceberg.DataFiles;

// Create a data file (simplified example)
DataFile dataFile = DataFiles.builder(spec)
    .withPath("/path/to/data-file.parquet")
    .withFileSizeInBytes(1024)
    .withRecordCount(100)
    .build();

// Append to table
table.newAppend()
    .appendFile(dataFile)
    .commit();
This is a simplified example. In practice, you would write data using a file writer or compute engine like Spark.
6

Read Data

Scan and read table data:
import org.apache.iceberg.TableScan;
import org.apache.iceberg.io.CloseableIterable;
import org.apache.iceberg.expressions.Expressions;
import org.apache.iceberg.FileScanTask;

// Create a scan
TableScan scan = table.newScan()
    .filter(Expressions.greaterThan("id", 100))
    .select("id", "name", "email");

// Get the files to read
try (CloseableIterable<FileScanTask> tasks = scan.planFiles()) {
    for (FileScanTask task : tasks) {
        System.out.println("File: " + task.file().path());
        System.out.println("Records: " + task.file().recordCount());
    }
}

Common Operations

Schema Evolution

Modify your table schema without rewriting data:
// Add a new column
table.updateSchema()
    .addColumn("phone", Types.StringType.get())
    .commit();

// Rename a column
table.updateSchema()
    .renameColumn("email", "email_address")
    .commit();

// Update column type (with compatible type)
table.updateSchema()
    .updateColumn("id", Types.LongType.get())
    .commit();

Time Travel

Access historical versions of your table:
// Read from a specific snapshot
TableScan scan = table.newScan()
    .useSnapshot(snapshotId);

// Read as of a timestamp
TableScan scan = table.newScan()
    .asOfTime(System.currentTimeMillis() - 3600000); // 1 hour ago

Table Maintenance

Keep your tables healthy:
-- Expire old snapshots (remove history)
CALL local.system.expire_snapshots(
    table => 'db.users',
    older_than => TIMESTAMP '2024-01-01 00:00:00'
);

-- Remove orphan files
CALL local.system.remove_orphan_files(
    table => 'db.users'
);

-- Compact small files
CALL local.system.rewrite_data_files(
    table => 'db.users'
);

Next Steps

Now that you’ve created your first Iceberg table, explore more advanced features:

Java API Deep Dive

Learn advanced Java API usage

Partitioning

Master hidden partitioning

Schema Evolution

Safely evolve table schemas

Performance

Optimize table performance

Troubleshooting

Make sure you have all required dependencies:
  • iceberg-core for the core API
  • iceberg-parquet or iceberg-orc for file formats
  • iceberg-hive-metastore for Hive catalog
  • Hadoop dependencies for HDFS access
For Spark, use the runtime JAR which includes all dependencies:
--packages org.apache.iceberg:iceberg-spark-runtime-3.5:1.7.1
Check that:
  1. Hive Metastore is running: netstat -an | grep 9083
  2. The URI is correct: thrift://localhost:9083
  3. Network firewall allows connections
  4. Hive configuration is in the classpath
Verify:
  1. The catalog name is correct
  2. The database/namespace exists
  3. You have permissions to access the table
  4. The table was created successfully
List tables to debug:
List<TableIdentifier> tables = catalog.listTables(Namespace.of("db"));
tables.forEach(System.out::println);
For more help, check the Apache Iceberg documentation or join the community Slack.

Build docs developers (and LLMs) love