Overview
Bloom filters are probabilistic data structures that help RocksDB avoid disk reads for keys that don’t exist. They are stored in SST files and consulted before reading data blocks.How Bloom Filters Work
Fromfilter_policy.h:9-15:
A database can be configured with a custom FilterPolicy object. This object is responsible for creating a small filter from a set of keys. These filters are stored in rocksdb and are consulted automatically to decide whether or not to read some information from disk.Bloom filters provide:
- No false negatives: If a key exists, the filter will always return “might exist”
- Possible false positives: Sometimes returns “might exist” for non-existent keys
- Space efficiency: Uses only a few bits per key
Creating Bloom Filters
NewBloomFilterPolicy
Fromfilter_policy.h:166-167:
bits_per_key: Average bits allocated per key in the filter- Recommended: 9.9 → ~1% false positive rate
- Lower values = more false positives, less memory
- Higher values = fewer false positives, more memory
Values < 0.5 are rounded to 0.0 (no filter). Values between 0.5 and 1.0 are rounded to 1.0 (62% FP rate).
Example Configuration
Ribbon Filters
Fromfilter_policy.h:169-210, Ribbon filters are a newer alternative:
Advantages
- ~30% space savings compared to Bloom filters
- Same false positive rate with fewer bits per key
- Similar query times
Trade-offs
- 3-4x higher CPU during construction
- 3x temporary memory during construction
- Better for lower (larger, longer-lived) LSM levels
Hybrid Configuration
Use Bloom for L0, Ribbon for deeper levels:0(default): Bloom for flushes only1: Bloom for L0, Ribbon for L1+-1: Always use Ribbon filtersINT_MAX: Always use Bloom filters
Filter Building Context
Fromfilter_policy.h:48-82, filters are built with contextual information:
GetBuilderWithContext() for advanced filter customization:
Statistics
Fromstatistics.h:111-127, RocksDB tracks filter effectiveness:
Tickers
Monitoring Filter Efficiency
Advanced Options
Partition Filters
For large SST files, partition filters into smaller blocks:Pin Filters in Cache
Keep filters in memory for frequently accessed files:Last Level Optimization
Disable filters on the last level (all keys will be checked anyway):advanced_options.h:801-815:
For keys which are hits, the filters in the last level are not useful because we will search for the data anyway. This flag allows us to not store filters for the last level.
Memory Considerations
Filter Memory Usage
Estimate filter memory:Cache Integration
Filters are cached separately from data blocks:cache.h:55-88, track filter cache usage:
Prefix Bloom Filters
Optimize for prefix scans:Custom Filter Policies
ImplementFilterPolicy for custom filtering logic:
Compatibility
Fromfilter_policy.h:94-105:
The CompatibilityName is a shared family name for filters that can read each others’ filters. Bloom and Ribbon filters share compatibility.Important: All built-in FilterPolicies can read other kinds of built-in filters. Ribbon filters require RocksDB >= 6.15.0. Earlier versions will ignore the filter (degraded performance).
Troubleshooting
High False Positive Rate
- Increase bits_per_key: Try 12-15 bits for ~0.1-0.5% FP rate
- Check statistics: Monitor
BLOOM_FILTER_USEFULvsBLOOM_FILTER_FULL_POSITIVE - Verify filter size: Ensure filters aren’t truncated or corrupted
High Memory Usage
- Use Ribbon filters: 30% space savings
- Enable partitioned filters: Reduce peak memory
- Disable last-level filters: Use
optimize_filters_for_hits
Slow Writes
- Use Bloom for L0: Ribbon’s CPU overhead affects flushes
- Reduce bits_per_key: Balance filter quality vs build time
- Monitor construction time: Check
FILTER_OPERATION_TOTAL_TIME
Best Practices
Start with defaults: Use
NewBloomFilterPolicy(9.9) and tune based on metrics.- Monitor effectiveness: Track
BLOOM_FILTER_USEFULticker - Profile workload: High miss rate benefits most from filters
- Consider Ribbon: Use for large databases with space constraints
- Pin critical filters: Keep L0/L1 filters in cache
- Optimize last level: Enable
optimize_filters_for_hitsif applicable