Skip to main content

What is a Catalog?

A catalog in Apache Iceberg is a fundamental abstraction that provides:
  • Table Discovery: Listing and locating tables within namespaces
  • Metadata Management: Tracking table metadata and schema evolution
  • Atomic Operations: Ensuring ACID guarantees for table commits
  • Namespace Organization: Hierarchical organization of tables

How Catalogs Work

Iceberg catalogs implement the Catalog interface, which provides core operations:
public interface Catalog {
  // Table operations
  Table createTable(TableIdentifier identifier, Schema schema, PartitionSpec spec);
  Table loadTable(TableIdentifier identifier);
  boolean dropTable(TableIdentifier identifier, boolean purge);
  void renameTable(TableIdentifier from, TableIdentifier to);
  
  // Discovery
  List<TableIdentifier> listTables(Namespace namespace);
}

Available Catalog Implementations

Iceberg provides several catalog implementations for different use cases:

Hive Metastore

Use existing Hive metastore infrastructure

AWS Glue

Native AWS Glue catalog integration

JDBC

Store metadata in any relational database

Nessie

Git-like operations with multi-table transactions

REST

Language-agnostic REST catalog service

Hadoop

File system-based catalog for development

Catalog vs TableOperations

While catalogs handle table discovery and high-level operations, TableOperations handle the low-level metadata read/write operations:
public interface TableOperations {
  TableMetadata current();
  TableMetadata refresh();
  void commit(TableMetadata base, TableMetadata metadata);
  FileIO io();
}

Choosing a Catalog

Consider these factors when selecting a catalog:
FactorRecommended Catalog
Existing Hive infrastructureHive Metastore Catalog
AWS ecosystem integrationGlue Catalog
High write throughputGlue, DynamoDB, or Nessie
Multi-table transactionsNessie Catalog
Branch/tag supportNessie Catalog
Relational databaseJDBC Catalog
Cloud-agnosticREST Catalog
Development/testingHadoop Catalog

Namespace Management

Catalogs that implement SupportsNamespaces provide hierarchical organization:
// Create a namespace
catalog.createNamespace(
  Namespace.of("analytics", "sales"),
  ImmutableMap.of("owner", "data-team")
);

// List namespaces
List<Namespace> namespaces = catalog.listNamespaces();

// Load namespace metadata
Map<String, String> metadata = catalog.loadNamespaceMetadata(
  Namespace.of("analytics")
);

Catalog Properties

Common catalog properties across implementations:
PropertyDescription
warehouseRoot path for table data and metadata
uriConnection URI for the catalog service
io-implCustom FileIO implementation class
catalog-implCustom catalog implementation class
cache-enabledEnable metadata caching (default: true)

Security Considerations

When working with encrypted tables, catalogs must:
  • Store encryption key metadata securely
  • Ensure key rotation is properly tracked
  • Maintain audit logs for key access
  • Implement proper access controls
See the Encryption documentation for detailed requirements.

Next Steps

Build Custom Catalog

Learn how to implement your own catalog

JDBC Catalog

Use relational databases for metadata

Nessie Integration

Git-like versioning for your data lake

Storage Configuration

Configure storage backends

Build docs developers (and LLMs) love