Transforms class provides factory methods for creating partition transform functions in Apache Iceberg.
Overview
Transforms are used to:- Partition data efficiently
- Create hidden partitions from column values
- Enable partition pruning during queries
PartitionSpec.builderFor(Schema) rather than directly.
Identity Transform
identity()
Returns an identity transform that passes values through unchanged.Bucket Transform
bucket()
Returns a bucket transform that hashes values into a fixed number of buckets.numBuckets- The number of buckets to distribute values into
- 4, 8, 16 - For small to medium tables
- 32, 64 - For larger tables
- 128, 256 - For very large tables
Truncate Transform
truncate()
Returns a truncate transform that truncates values to a specified width.width- The width to truncate to- For strings: truncates to width characters
- For integers/longs: truncates to width units
- For decimals: truncates to width units
Temporal Transforms
year()
Extracts the year from dates or timestamps.month()
Extracts the month from dates or timestamps (as months since epoch).day()
Extracts the day from dates or timestamps (as days since epoch).hour()
Extracts the hour from timestamps (as hours since epoch).Void Transform
alwaysNull()
Returns a transform that always produces null (void transform).Parsing Transforms
fromString()
Parses a transform from a string representation."identity""year","month","day","hour""bucket[N]"- e.g.,"bucket[16]""truncate[N]"- e.g.,"truncate[10]""void"
Examples
Basic Partition Specs
Time-Based Partitioning
Multi-Level Partitioning
String Truncation
Numeric Truncation
Hash-Based Distribution
Evolving Partition Specs
Custom Partition Values
Transform String Representation
Best Practices
Choosing Partition Transforms
- Time-based data: Use
year(),month(),day(), orhour()based on query patterns - High cardinality columns: Use
bucket()to limit number of partitions - String prefixes: Use
truncate()for prefix-based partitioning - Low cardinality: Use
identity()for direct partitioning
Partition Granularity
Bucket Count Selection
See Also
- PartitionSpec - Partition specification
- Expressions - Expression API for filters
- Types - Type system