ComputePartitionStats
TheComputePartitionStats action computes and writes partition statistics for an Iceberg table. This action helps optimize query planning by providing metadata about partition-level data characteristics.
Interface
Overview
Partition statistics provide valuable metadata that query engines can use to optimize execution plans. The action:- Computes statistics for partitions in a table snapshot
- Writes statistics to a dedicated statistics file
- Uses the current snapshot by default
- Can target a specific snapshot if needed
Methods
snapshot
Choose a specific table snapshot to compute partition statistics.snapshotId- The ID of the snapshot for which stats need to be computed
this for method chaining
Example:
If not specified, the action uses the current snapshot of the table.
Result
TheResult interface provides information about the computed statistics.
Methods
statisticsFile
Returns the statistics file that was written, ornull if no statistics were collected.
Returns: PartitionStatisticsFile or null
Usage Examples
Compute Stats for Current Snapshot
Compute Stats for Specific Snapshot
Check Statistics File Details
Best Practices
- Compute after significant data changes: Run this action after major writes or rewrites to keep statistics current
- Use with query optimization: Ensure your query engine is configured to use partition statistics
- Monitor statistics freshness: Outdated statistics may lead to suboptimal query plans
- Consider snapshot selection: For historical analysis, compute stats on specific snapshots
Partition statistics are most beneficial for partitioned tables with selective query patterns.
Related
- ComputeTableStats - Compute table-level statistics
- ExpireSnapshots - Manage old snapshots
- RewriteDataFiles - Optimize data file layout