Quick Start Guide
This guide will help you get started with Apache Iceberg quickly. You’ll learn how to add Iceberg to your project, create your first table, and perform basic operations.The latest version of Iceberg can be found on the releases page. This guide uses examples compatible with Iceberg 1.0+.
Installation
Add Dependencies
Add Iceberg to your project using Maven or Gradle.
Module guide:
iceberg-core- The core API and implementations (required)iceberg-parquet- For Parquet file format supporticeberg-orc- For ORC file format supporticeberg-hive-metastore- For Hive Metastore catalogiceberg-data- For direct JVM data access
Choose a Catalog
Iceberg uses catalogs to manage tables. Choose the catalog that fits your environment:
- Hadoop Catalog - File-based catalog for HDFS or S3
- Hive Metastore - Uses existing Hive Metastore
- AWS Glue - For AWS environments
- Nessie - Git-like data catalog
- REST Catalog - HTTP-based catalog service
Using Spark (Recommended for Getting Started)
Spark is the most feature-rich engine for Iceberg and the easiest way to get started.Start Spark Shell with Iceberg
Launch Spark with the Iceberg runtime package:Or for
spark-sql:Replace
3.5 with your Spark version (e.g., 3.3, 3.4, 3.5).Using the Java API
For programmatic access, use the Iceberg Java API.Define a Schema
Create a schema for your table:
Type IDs must be unique within the schema. Iceberg automatically reassigns IDs when creating tables to ensure uniqueness.
Common Operations
Schema Evolution
Modify your table schema without rewriting data:Time Travel
Access historical versions of your table:Table Maintenance
Keep your tables healthy:Next Steps
Now that you’ve created your first Iceberg table, explore more advanced features:Java API Deep Dive
Learn advanced Java API usage
Partitioning
Master hidden partitioning
Schema Evolution
Safely evolve table schemas
Performance
Optimize table performance
Troubleshooting
ClassNotFoundException or NoClassDefFoundError
ClassNotFoundException or NoClassDefFoundError
Make sure you have all required dependencies:
iceberg-corefor the core APIiceberg-parquetoriceberg-orcfor file formatsiceberg-hive-metastorefor Hive catalog- Hadoop dependencies for HDFS access
Connection refused to Hive Metastore
Connection refused to Hive Metastore
Check that:
- Hive Metastore is running:
netstat -an | grep 9083 - The URI is correct:
thrift://localhost:9083 - Network firewall allows connections
- Hive configuration is in the classpath
Table not found
Table not found
Verify:
- The catalog name is correct
- The database/namespace exists
- You have permissions to access the table
- The table was created successfully