What is a Catalog?
A catalog in Apache Iceberg is a fundamental abstraction that provides:- Table Discovery: Listing and locating tables within namespaces
- Metadata Management: Tracking table metadata and schema evolution
- Atomic Operations: Ensuring ACID guarantees for table commits
- Namespace Organization: Hierarchical organization of tables
How Catalogs Work
Iceberg catalogs implement theCatalog interface, which provides core operations:
Available Catalog Implementations
Iceberg provides several catalog implementations for different use cases:Hive Metastore
Use existing Hive metastore infrastructure
AWS Glue
Native AWS Glue catalog integration
JDBC
Store metadata in any relational database
Nessie
Git-like operations with multi-table transactions
REST
Language-agnostic REST catalog service
Hadoop
File system-based catalog for development
Catalog vs TableOperations
While catalogs handle table discovery and high-level operations, TableOperations handle the low-level metadata read/write operations:Choosing a Catalog
Consider these factors when selecting a catalog:| Factor | Recommended Catalog |
|---|---|
| Existing Hive infrastructure | Hive Metastore Catalog |
| AWS ecosystem integration | Glue Catalog |
| High write throughput | Glue, DynamoDB, or Nessie |
| Multi-table transactions | Nessie Catalog |
| Branch/tag support | Nessie Catalog |
| Relational database | JDBC Catalog |
| Cloud-agnostic | REST Catalog |
| Development/testing | Hadoop Catalog |
Namespace Management
Catalogs that implementSupportsNamespaces provide hierarchical organization:
Catalog Properties
Common catalog properties across implementations:| Property | Description |
|---|---|
warehouse | Root path for table data and metadata |
uri | Connection URI for the catalog service |
io-impl | Custom FileIO implementation class |
catalog-impl | Custom catalog implementation class |
cache-enabled | Enable metadata caching (default: true) |
Security Considerations
When working with encrypted tables, catalogs must:- Store encryption key metadata securely
- Ensure key rotation is properly tracked
- Maintain audit logs for key access
- Implement proper access controls
Next Steps
Build Custom Catalog
Learn how to implement your own catalog
JDBC Catalog
Use relational databases for metadata
Nessie Integration
Git-like versioning for your data lake
Storage Configuration
Configure storage backends