Overview
The block cache is RocksDB’s primary in-memory cache for uncompressed data blocks read from SST files. It’s critical for read performance, reducing disk I/O by caching frequently accessed blocks.
A properly sized and configured block cache can improve read throughput by orders of magnitude.
Cache Implementations
From cache.h:46-47, RocksDB provides two main cache implementations:
using BlockCache = Cache;
using RowCache = Cache; // For row-level caching
LRU Cache
From cache.h:207-275, the traditional LRU (Least Recently Used) cache:
struct LRUCacheOptions : public ShardedCacheOptions {
size_t capacity = 0;
int num_shard_bits = -1;
bool strict_capacity_limit = false;
double high_pri_pool_ratio = 0.5;
double low_pri_pool_ratio = 0.0;
bool use_adaptive_mutex = kDefaultToAdaptiveMutex;
CacheMetadataChargePolicy metadata_charge_policy =
kFullChargeCacheMetadata;
};
Creating an LRU cache:
#include "rocksdb/cache.h"
// Simple creation
auto cache = NewLRUCache(512 * 1024 * 1024); // 512 MB
// Or with options
LRUCacheOptions cache_opts;
cache_opts.capacity = 512 * 1024 * 1024;
cache_opts.num_shard_bits = 6; // 64 shards
cache_opts.strict_capacity_limit = false;
auto cache = cache_opts.MakeSharedCache();
BlockBasedTableOptions table_options;
table_options.block_cache = cache;
HyperClockCache (Recommended)
From cache.h:371-476, lock-free cache with better CPU efficiency:
HyperClockCache is now generally recommended over LRUCache for better performance under high concurrency.
struct HyperClockCacheOptions : public ShardedCacheOptions {
size_t estimated_entry_charge = 0;
size_t min_avg_entry_charge = 450;
int eviction_effort_cap = 30;
};
Creating HyperClockCache:
HyperClockCacheOptions hcc_opts;
hcc_opts.capacity = 512 * 1024 * 1024;
hcc_opts.estimated_entry_charge = 0; // Dynamic sizing (recommended)
auto cache = hcc_opts.MakeSharedCache();
Key advantages:
- Much improved CPU efficiency under parallel load
- Lock-free implementation reduces contention
- Larger cache shards (less risk of thrashing)
Caveats:
- Only for
BlockBasedTableOptions::block_cache
- Not a general-purpose Cache (fixed-size keys expected)
- Requires anonymous mmap support (Linux, Windows)
Configuration Options
Capacity
From cache.h:129-132:
size_t capacity; // Total cache capacity in bytes
Sizing guidelines:
// Conservative: 1/3 of available RAM
size_t capacity = total_ram / 3;
// Aggressive: Up to 60-70% for read-heavy workloads
size_t capacity = total_ram * 0.6;
// Minimum: At least enough for active working set
size_t capacity = working_set_size * 1.5;
Sharding
From cache.h:134-138:
int num_shard_bits = -1; // Cache is sharded into 2^num_shard_bits shards
Default (-1) chooses a good value based on capacity. LRUCache benefits more from sharding than HyperClockCache.
Strict Capacity Limit
From cache.h:140-145:
bool strict_capacity_limit = false;
true: Insert fails if cache is full and no entries can be evicted
false: Insert always succeeds, cache may exceed capacity temporarily
LRUCacheOptions opts;
opts.capacity = 512 * 1024 * 1024;
opts.strict_capacity_limit = true; // Fail inserts when full
Status s = cache->Insert(key, value, helper, charge, &handle);
if (s.IsMemoryLimit()) {
// Handle cache full condition
}
Priority Levels
From advanced_cache.h:61-68 and cache.h:224-246:
enum class Priority { HIGH, LOW, BOTTOM };
double high_pri_pool_ratio = 0.5; // Ratio for high-priority entries
double low_pri_pool_ratio = 0.0; // Ratio for low-priority entries
Priority pools:
- HIGH: Index and filter blocks (if
cache_index_and_filter_blocks_with_high_priority)
- LOW: Data blocks (default)
- BOTTOM: Blob values
BlockBasedTableOptions table_options;
table_options.cache_index_and_filter_blocks = true;
table_options.cache_index_and_filter_blocks_with_high_priority = true;
LRUCacheOptions cache_opts;
cache_opts.high_pri_pool_ratio = 0.5; // 50% reserved for high-pri
Entry Roles
From cache.h:55-88, cache entries are classified by role:
enum class CacheEntryRole {
kDataBlock, // Data blocks
kFilterBlock, // Filter blocks
kFilterMetaBlock, // Partitioned filter metadata
kIndexBlock, // Index blocks
kCompressionDictionaryBuildingBuffer,
kFilterConstruction, // Filter construction buffer
kBlockBasedTableReader, // Table reader metadata
kWriteBuffer, // Memtable charging
kBlobValue, // Blob cache entries
kMisc, // Miscellaneous
};
Monitoring by Role
From cache.h:101-110:
struct BlockCacheEntryStatsMapKeys {
static std::string EntryCount(CacheEntryRole role);
static std::string UsedBytes(CacheEntryRole role);
static std::string UsedPercent(CacheEntryRole role);
};
// Usage
std::map<std::string, uint64_t> values;
db->GetMapProperty(DB::Properties::kBlockCacheEntryStats, &values);
uint64_t filter_bytes = values[
BlockCacheEntryStatsMapKeys::UsedBytes(CacheEntryRole::kFilterBlock)
];
Advanced Features
Secondary Cache
From cache.h:159-161, add a compressed secondary tier:
std::shared_ptr<SecondaryCache> secondary_cache;
Compressed secondary cache:
#include "rocksdb/cache.h"
CompressedSecondaryCacheOptions secondary_opts;
secondary_opts.capacity = 2 * 1024 * 1024 * 1024; // 2 GB compressed
secondary_opts.compression_type = CompressionType::kLZ4Compression;
auto secondary_cache = secondary_opts.MakeSharedSecondaryCache();
LRUCacheOptions primary_opts;
primary_opts.capacity = 512 * 1024 * 1024; // 512 MB uncompressed
primary_opts.secondary_cache = secondary_cache;
auto cache = primary_opts.MakeSharedCache();
Tiered Cache
From cache.h:518-547, experimental multi-tier caching:
struct TieredCacheOptions {
ShardedCacheOptions* cache_opts; // Primary cache options
PrimaryCacheType cache_type; // kCacheTypeLRU or kCacheTypeHCC
TieredAdmissionPolicy adm_policy; // Admission policy
CompressedSecondaryCacheOptions comp_cache_opts;
size_t total_capacity; // Total budget across tiers
double compressed_secondary_ratio; // Ratio for compressed tier
std::shared_ptr<SecondaryCache> nvm_sec_cache; // NVM tier
};
auto cache = NewTieredCache(tiered_opts);
From cache.h:114-124:
enum CacheMetadataChargePolicy {
kDontChargeCacheMetadata, // Only entry charge counts
kFullChargeCacheMetadata // Include metadata overhead
};
LRUCacheOptions opts;
opts.metadata_charge_policy = kFullChargeCacheMetadata; // Default
kFullChargeCacheMetadata counts the cache’s internal overhead against capacity for more accurate memory accounting.
Memory Allocator
From cache.h:147-153, use a custom allocator:
std::shared_ptr<MemoryAllocator> memory_allocator;
// Example: jemalloc allocator
LRUCacheOptions opts;
opts.memory_allocator = std::make_shared<JemallocAllocator>();
Usage Patterns
Insert
From advanced_cache.h:225-266:
Status Insert(
const Slice& key,
ObjectPtr obj,
const CacheItemHelper* helper,
size_t charge,
Handle** handle = nullptr,
Priority priority = Priority::LOW,
const Slice& compressed = Slice(),
CompressionType type = kNoCompression);
Lookup
From advanced_cache.h:281-295:
Handle* Lookup(
const Slice& key,
const CacheItemHelper* helper = nullptr,
CreateContext* create_context = nullptr,
Priority priority = Priority::LOW,
Statistics* stats = nullptr);
// Always release handles when done
if (handle != nullptr) {
cache->Release(handle);
}
Async Lookup
From advanced_cache.h:456-543, for non-blocking reads:
Cache::AsyncLookupHandle async_handle(key, helper, create_context);
cache->StartAsyncLookup(async_handle);
// Do other work...
Handle* handle = cache->Wait(async_handle); // Wait for result
if (handle != nullptr) {
ObjectPtr value = cache->Value(handle);
cache->Release(handle);
}
Statistics
From statistics.h:32-95, monitor cache performance:
// Cache hits and misses
BLOCK_CACHE_HIT
BLOCK_CACHE_MISS
BLOCK_CACHE_ADD
BLOCK_CACHE_ADD_FAILURES
// By block type
BLOCK_CACHE_INDEX_HIT
BLOCK_CACHE_INDEX_MISS
BLOCK_CACHE_FILTER_HIT
BLOCK_CACHE_FILTER_MISS
BLOCK_CACHE_DATA_HIT
BLOCK_CACHE_DATA_MISS
// Bytes
BLOCK_CACHE_BYTES_READ
BLOCK_CACHE_BYTES_WRITE
// Redundant inserts
BLOCK_CACHE_ADD_REDUNDANT
// Secondary cache
SECONDARY_CACHE_HITS
COMPRESSED_SECONDARY_CACHE_HITS
Example monitoring:
auto stats = options.statistics;
uint64_t hits = stats->getTickerCount(BLOCK_CACHE_HIT);
uint64_t misses = stats->getTickerCount(BLOCK_CACHE_MISS);
uint64_t total = hits + misses;
if (total > 0) {
double hit_rate = 100.0 * hits / total;
printf("Cache hit rate: %.2f%%\n", hit_rate);
}
// Get detailed stats by role
std::map<std::string, uint64_t> cache_stats;
db->GetMapProperty(DB::Properties::kBlockCacheEntryStats, &cache_stats);
Cache Management
Runtime Control
From advanced_cache.h:338-356:
// Change capacity dynamically
cache->SetCapacity(new_capacity);
// Enable/disable strict limit
cache->SetStrictCapacityLimit(true);
// Query state
size_t capacity = cache->GetCapacity();
size_t usage = cache->GetUsage();
size_t pinned = cache->GetPinnedUsage();
printf("Cache: %zu / %zu (%.1f%% full, %zu pinned)\n",
usage, capacity, 100.0 * usage / capacity, pinned);
Eviction Callback
From advanced_cache.h:545-554:
using EvictionCallback =
std::function<bool(const Slice& key, Handle* h, bool was_hit)>;
cache->SetEvictionCallback(
[](const Slice& key, Handle* h, bool was_hit) {
// Custom eviction logic
// Return true to take ownership, false for normal destruction
return false;
}
);
Best Practices
Shared vs. Separate Caches: Use a shared cache across all column families to maximize hit rate and avoid fragmentation.
1. Size Appropriately
// Start with 1/3 of available RAM
size_t total_ram = GetSystemMemory();
size_t cache_size = total_ram / 3;
HyperClockCacheOptions opts;
opts.capacity = cache_size;
2. Use HyperClockCache
// Prefer HyperClockCache for better concurrency
auto cache = HyperClockCacheOptions(cache_size).MakeSharedCache();
3. Cache Index and Filters
BlockBasedTableOptions table_options;
table_options.cache_index_and_filter_blocks = true;
table_options.cache_index_and_filter_blocks_with_high_priority = true;
table_options.pin_l0_filter_and_index_blocks_in_cache = true;
4. Monitor Hit Rate
// Aim for >85% hit rate
auto stats = db->GetDBOptions().statistics;
uint64_t hits = stats->getTickerCount(BLOCK_CACHE_HIT);
uint64_t misses = stats->getTickerCount(BLOCK_CACHE_MISS);
double hit_rate = 100.0 * hits / (hits + misses);
if (hit_rate < 85.0) {
// Consider increasing cache size
}
5. Use Secondary Cache for Large Datasets
// 512 MB primary + 2 GB compressed secondary
CompressedSecondaryCacheOptions sec_opts;
sec_opts.capacity = 2ULL * 1024 * 1024 * 1024;
HyperClockCacheOptions pri_opts;
pri_opts.capacity = 512 * 1024 * 1024;
pri_opts.secondary_cache = sec_opts.MakeSharedSecondaryCache();
Troubleshooting
Low Hit Rate
- Increase cache size: More capacity = better hit rate
- Check working set: Cache should hold active working set
- Pin critical blocks: Use high priority for index/filters
- Monitor entry stats: Identify which block types are missing
High Memory Usage
- Enable metadata charging: Use
kFullChargeCacheMetadata
- Set strict limit: Prevent cache from exceeding capacity
- Reduce capacity: Trade hit rate for memory
- Use compressed secondary: Offload to compressed tier
Contention (LRU Cache)
- Switch to HyperClockCache: Better for high concurrency
- Increase sharding: Higher
num_shard_bits
- Profile mutex wait: Check
DB_MUTEX_WAIT_MICROS statistic
See Also