Overview
The JDBC catalog allows you to manage Iceberg tables using any relational database that supports atomic transactions and read serializable isolation. This provides a lightweight, portable catalog solution without requiring Hive or cloud-specific services.
Supported Databases
The JDBC catalog works with any database that supports:
Atomic transactions for safe concurrent commits
Serializable isolation for consistent reads
Common databases:
PostgreSQL
MySQL
Oracle
SQL Server
SQLite (for development only)
SQLite does not support concurrent writes well. Use it only for local development and testing.
Configuration Properties
Property Required Description uriYes JDBC connection string (e.g., jdbc:mysql://host:3306/db) jdbc.userNo Database username jdbc.passwordNo Database password jdbc.<property>No Any JDBC driver property warehouseYes Root path for table data and metadata io-implNo Custom FileIO implementation (default: HadoopFileIO)
Usage Examples
Spark Configuration
Start Spark with JDBC catalog using MySQL:
spark-sql \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{icebergVersion} \
--conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.my_catalog.type=jdbc \
--conf spark.sql.catalog.my_catalog.uri=jdbc:mysql://test.rds.us-west-2.amazonaws.com:3306/iceberg \
--conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/warehouse \
--conf spark.sql.catalog.my_catalog.jdbc.user=admin \
--conf spark.sql.catalog.my_catalog.jdbc.password=secret \
--conf spark.sql.catalog.my_catalog.jdbc.verifyServerCertificate= true \
--conf spark.sql.catalog.my_catalog.jdbc.useSSL= true
PostgreSQL Example
spark-sql \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{icebergVersion} \
--conf spark.sql.catalog.postgres_catalog=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.postgres_catalog.type=jdbc \
--conf spark.sql.catalog.postgres_catalog.uri=jdbc:postgresql://localhost:5432/iceberg \
--conf spark.sql.catalog.postgres_catalog.warehouse=s3://warehouse/path \
--conf spark.sql.catalog.postgres_catalog.jdbc.user=iceberg_user \
--conf spark.sql.catalog.postgres_catalog.jdbc.password=iceberg_pass
Java API
Programmatically create a JDBC catalog:
import org.apache.iceberg.catalog.Catalog;
import org.apache.iceberg.CatalogProperties;
import org.apache.iceberg.CatalogUtil;
import org.apache.iceberg.jdbc.JdbcCatalog;
import org.apache.hadoop.conf.Configuration;
import java.util.HashMap;
import java.util.Map;
// Ensure JDBC driver is loaded
Class . forName ( "com.mysql.cj.jdbc.Driver" );
// Configure catalog properties
Map < String , String > properties = new HashMap <>();
properties . put ( CatalogProperties . CATALOG_IMPL , JdbcCatalog . class . getName ());
properties . put ( CatalogProperties . URI , "jdbc:mysql://localhost:3306/iceberg" );
properties . put ( JdbcCatalog . PROPERTY_PREFIX + "user" , "admin" );
properties . put ( JdbcCatalog . PROPERTY_PREFIX + "password" , "secret" );
properties . put ( CatalogProperties . WAREHOUSE_LOCATION , "s3://warehouse/path" );
// Create catalog
Configuration hadoopConf = new Configuration ();
Catalog catalog = CatalogUtil . buildIcebergCatalog (
"jdbc_catalog" ,
properties,
hadoopConf
);
// Use the catalog
Table table = catalog . loadTable ( TableIdentifier . of ( "db" , "table" ));
Flink Configuration
CREATE CATALOG jdbc_catalog WITH (
'type' = 'iceberg' ,
'catalog-type' = 'jdbc' ,
'uri' = 'jdbc:postgresql://localhost:5432/iceberg' ,
'jdbc.user' = 'iceberg_user' ,
'jdbc.password' = 'iceberg_pass' ,
'warehouse' = 'hdfs://namenode:8020/warehouse'
);
USE CATALOG jdbc_catalog;
SHOW DATABASES;
PySpark Example
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName( "Iceberg JDBC Catalog" ) \
.config( "spark.sql.catalog.jdbc_catalog" , "org.apache.iceberg.spark.SparkCatalog" ) \
.config( "spark.sql.catalog.jdbc_catalog.type" , "jdbc" ) \
.config( "spark.sql.catalog.jdbc_catalog.uri" , "jdbc:postgresql://localhost:5432/iceberg" ) \
.config( "spark.sql.catalog.jdbc_catalog.warehouse" , "s3://warehouse/path" ) \
.config( "spark.sql.catalog.jdbc_catalog.jdbc.user" , "iceberg_user" ) \
.config( "spark.sql.catalog.jdbc_catalog.jdbc.password" , "iceberg_pass" ) \
.getOrCreate()
# Create a table
spark.sql( """
CREATE TABLE jdbc_catalog.db.sample (
id bigint,
data string,
category string
)
USING iceberg
PARTITIONED BY (category)
""" )
# Query the table
df = spark.sql( "SELECT * FROM jdbc_catalog.db.sample" )
df.show()
Database Schema
The JDBC catalog creates these tables in your database:
iceberg_tables
Stores table metadata:
Column Type Description catalog_name VARCHAR Catalog identifier table_namespace VARCHAR Namespace (database) name table_name VARCHAR Table name metadata_location VARCHAR Current metadata file path previous_metadata_location VARCHAR Previous metadata file path
iceberg_namespace_properties
Stores namespace metadata:
Column Type Description catalog_name VARCHAR Catalog identifier namespace VARCHAR Namespace name property_key VARCHAR Property key property_value VARCHAR Property value
Connection Pooling
For production deployments, configure connection pooling:
spark-sql \
--conf spark.sql.catalog.my_catalog.jdbc.maximumPoolSize= 10 \
--conf spark.sql.catalog.my_catalog.jdbc.minimumIdle= 2 \
--conf spark.sql.catalog.my_catalog.jdbc.connectionTimeout= 30000
SSL/TLS Configuration
MySQL with SSL
spark-sql \
--conf spark.sql.catalog.my_catalog.uri=jdbc:mysql://host:3306/db \
--conf spark.sql.catalog.my_catalog.jdbc.useSSL= true \
--conf spark.sql.catalog.my_catalog.jdbc.requireSSL= true \
--conf spark.sql.catalog.my_catalog.jdbc.verifyServerCertificate= true \
--conf spark.sql.catalog.my_catalog.jdbc.trustCertificateKeyStoreUrl=file:///path/to/truststore \
--conf spark.sql.catalog.my_catalog.jdbc.trustCertificateKeyStorePassword=secret
PostgreSQL with SSL
spark-sql \
--conf spark.sql.catalog.my_catalog.uri=jdbc:postgresql://host:5432/db?ssl= true \
--conf spark.sql.catalog.my_catalog.jdbc.sslmode=verify-full \
--conf spark.sql.catalog.my_catalog.jdbc.sslrootcert=/path/to/root.crt
AWS RDS with IAM Authentication
Use IAM authentication instead of passwords:
import com.amazonaws.auth.DefaultAWSCredentialsProviderChain;
import com.amazonaws.services.rds.auth.GetIamAuthTokenRequest;
import com.amazonaws.services.rds.auth.RdsIamAuthTokenGenerator;
// Generate auth token
RdsIamAuthTokenGenerator generator = RdsIamAuthTokenGenerator . builder ()
. credentials ( new DefaultAWSCredentialsProviderChain ())
. region ( "us-west-2" )
. build ();
String authToken = generator . getAuthToken (
GetIamAuthTokenRequest . builder ()
. hostname ( "mydb.rds.amazonaws.com" )
. port ( 3306 )
. userName ( "iceberg_user" )
. build ()
);
// Use token as password
properties . put ( "jdbc.password" , authToken);
See AWS RDS IAM Authentication for complete setup.
--conf spark.sql.catalog.my_catalog.jdbc.maximumPoolSize= 20 \
--conf spark.sql.catalog.my_catalog.jdbc.minimumIdle= 5 \
--conf spark.sql.catalog.my_catalog.jdbc.idleTimeout= 600000
--conf spark.sql.catalog.my_catalog.jdbc.connectionTimeout= 5000 \
--conf spark.sql.catalog.my_catalog.jdbc.validationTimeout= 3000
Migration from Hive
Migrate existing Hive tables to JDBC catalog:
-- Export Hive table metadata
CREATE TABLE jdbc_catalog . db . my_table
USING iceberg
AS SELECT * FROM hive_catalog . db .my_table;
-- Or register existing Iceberg table
CALL jdbc_catalog . system .register_table(
table => 'db.my_table' ,
metadata_file => 's3://bucket/path/to/metadata/v1.metadata.json'
);
Troubleshooting
Verify:
Database is running and accessible
Firewall rules allow connections
JDBC driver is in classpath
Connection string is correct
Check:
Username and password are correct
User has CREATE, SELECT, INSERT, UPDATE, DELETE permissions
SSL/TLS configuration if required
Ensure:
Catalog schema exists in database
Tables were created with correct catalog
Namespace exists
Best Practices
Use connection pooling for production workloads
Enable SSL/TLS for secure connections
Regular backups of the catalog database
Monitor database performance for slow queries
Use IAM authentication when available (AWS RDS)
Separate warehouse storage from catalog database
Next Steps
Nessie Catalog Git-like versioning for data lakes
Custom Catalog Build your own catalog implementation