RewriteManifests
TheRewriteManifests action rewrites manifest files to optimize metadata access and query planning performance. This is useful for improving scan planning efficiency and reducing metadata overhead.
Interface
Overview
Manifest files contain metadata about data files in a table. Over time, as tables evolve through many write operations, manifest files can become:- Fragmented across many small files
- Poorly organized for common query patterns
- Associated with outdated partition specifications
- Reduce the number of manifest files
- Organize manifests by partition values for better pruning
- Update manifests to use current partition specifications
- Improve query planning performance
Methods
specId
Rewrite manifests for a specific partition specification ID.specId- The partition specification ID
this for method chaining
Example:
If not set, defaults to the table’s default partition specification ID.
rewriteIf
Rewrite only manifests that match a given predicate.predicate- A predicate to test manifest files
this for method chaining
Example:
sortBy
Sort rewritten manifests by specific partition field names.partitionFields- Exact transformed partition field names to sort by
this for method chaining
Example:
Use transformed column names (e.g., “data_bucket”) not raw column names (e.g., “data”) for bucketed partitions.
stagingLocation
Specify a custom location for staging rewritten manifests.stagingLocation- Path where staged manifests should be written
this for method chaining
Example:
If not set, defaults to the table’s metadata location.
Result
TheResult interface provides information about the rewrite operation.
Methods
Usage Examples
Basic Manifest Rewrite
Consolidate Small Manifests
Optimize for Query Patterns
Rewrite Specific Partition Spec
With Custom Staging Location
Conditional Rewrite
When to Rewrite Manifests
Consider rewriting manifests when:- After many small writes: Frequent small appends create many small manifest files
- Query planning is slow: Too many manifests increase planning overhead
- Changing partition specs: After evolving partition specifications
- Optimizing for new query patterns: When query patterns change significantly
- After major compaction: Following large data file rewrites
Best Practices
-
Sort by query patterns: Use
sortBy()to organize manifests for your most common queries -
Use predicates wisely: The
rewriteIf()method can target specific problematic manifests - Monitor manifest count: Track manifest file counts; excessive fragmentation hurts performance
- Combine with data optimization: Often beneficial after rewriting data files
- Schedule periodic rewrites: Run on a regular schedule for tables with frequent writes
Performance Impact
Benefits
- Faster query planning
- Reduced metadata overhead
- Better partition pruning
- Fewer files to read during planning
Costs
- Creates a new snapshot
- Requires reading and writing manifest files
- May require temporary storage for staging
Related
- RewriteDataFiles - Optimize data file layout
- ExpireSnapshots - Clean up old snapshots
- Manifest Files - Understanding manifest structure