data, delete, manifest and manifest list files are encrypted and tamper-proofed before being sent to the storage backend.
The metadata.json file does not contain data or stats, and is therefore not encrypted.
Currently, encryption is supported in the Hive and REST catalogs for tables with Parquet and Avro data formats.
Configuration Requirements
Two parameters are required to activate encryption of a table:-
Catalog property - Specifies the KMS (“key management service”):
encryption.kms-typefor pre-defined KMS clients (aws,azureorgcp)encryption.kms-implwith the client class path for custom KMS clients
-
Table property -
encryption.key-idspecifies the ID of a master key used to encrypt and decrypt the table. Master keys are stored and managed in the KMS.
Example
Verify encryption
Verify encryption by dumping file contents:
- Parquet files must start with the “PARE” magic string (PARquet Encrypted footer mode)
- Manifest/list files must start with “AGS1” magic string (Aes Gcm Stream version 1)
Catalog Security Requirements
To function properly, Iceberg table encryption requires the catalog implementations not to retrieve the metadata directly frommetadata.json files, if these files are kept unprotected in a storage vulnerable to tampering:
- Catalogs may keep the metadata in a trusted independent object store
- Catalogs may work with
metadata.jsonfiles in a tamper-proof storage - Catalogs may use checksum techniques to verify integrity of
metadata.jsonfiles in a storage vulnerable to tampering (the checksums must be kept in a separate trusted storage)
Key Management Clients
Currently, Iceberg has clients for the AWS, GCP and Azure KMS systems. A custom client can be built for other key management systems by implementing theorg.apache.iceberg.encryption.KeyManagementClient interface.
Interface Methods
This interface has the following main methods:Appendix: Internals Overview
The standard Iceberg encryption manager generates an encryption key and a unique file ID (“AAD prefix”) for each data and delete file. The generation is performed in the worker nodes, by using a secure random number generator.Data File Encryption
- Parquet data files: Parameters are passed to the native Parquet Modular Encryption mechanism
- Avro data files: Parameters are passed to the AES GCM Stream encryption mechanism
Manifest File Encryption
The parent manifest file stores the encryption key and AAD prefix for each data and delete file in thekey_metadata field. For Avro data tables, the data file length is also added to the key_metadata.
The manifest file is encrypted by the AES GCM Stream encryption mechanism, using an encryption key and an AAD prefix generated by the standard encryption manager in the driver nodes.
Manifest List Encryption
The parent manifest list file stores the encryption key, AAD prefix and file length for each manifest file in thekey_metadata field. The manifest list file is encrypted by the AES GCM Stream encryption mechanism, using an encryption key and an AAD prefix generated by the standard encryption manager.
Key Encryption Keys (KEK)
The manifest list encryption key, AAD prefix and file length are packed in a key metadata object. This object is serialized and encrypted with a “key encryption key” (KEK), using the KEK creation timestamp as the AES GCM AAD. A KEK and its unique KEK_ID are generated by using a secure random number generator. For each snapshot:- The KEK_ID is kept in the
key-idfield in the table metadata snapshot structure - The encrypted manifest list key metadata is kept in the
encryption-keyslist in the table metadata structure - The KEK is encrypted by the table master key via the KMS client
KEK Rotation
The KEK is re-used for a period allowed by the NIST SP 800-57 specification. Then, it is rotated:- A new KEK and KEK_ID are generated for encryption of new manifest list key metadata objects
- The new KEK is encrypted by the table master key and stored in the
encryption-keyslist - Previous KEKs are retained for the existing table snapshots