Skip to main content

Overview

The block cache is RocksDB’s primary in-memory cache for uncompressed data blocks read from SST files. It’s critical for read performance, reducing disk I/O by caching frequently accessed blocks.
A properly sized and configured block cache can improve read throughput by orders of magnitude.

Cache Implementations

From cache.h:46-47, RocksDB provides two main cache implementations:
using BlockCache = Cache;
using RowCache = Cache;  // For row-level caching

LRU Cache

From cache.h:207-275, the traditional LRU (Least Recently Used) cache:
struct LRUCacheOptions : public ShardedCacheOptions {
  size_t capacity = 0;
  int num_shard_bits = -1;
  bool strict_capacity_limit = false;
  double high_pri_pool_ratio = 0.5;
  double low_pri_pool_ratio = 0.0;
  bool use_adaptive_mutex = kDefaultToAdaptiveMutex;
  CacheMetadataChargePolicy metadata_charge_policy =
      kFullChargeCacheMetadata;
};
Creating an LRU cache:
#include "rocksdb/cache.h"

// Simple creation
auto cache = NewLRUCache(512 * 1024 * 1024);  // 512 MB

// Or with options
LRUCacheOptions cache_opts;
cache_opts.capacity = 512 * 1024 * 1024;
cache_opts.num_shard_bits = 6;  // 64 shards
cache_opts.strict_capacity_limit = false;
auto cache = cache_opts.MakeSharedCache();

BlockBasedTableOptions table_options;
table_options.block_cache = cache;
From cache.h:371-476, lock-free cache with better CPU efficiency:
HyperClockCache is now generally recommended over LRUCache for better performance under high concurrency.
struct HyperClockCacheOptions : public ShardedCacheOptions {
  size_t estimated_entry_charge = 0;
  size_t min_avg_entry_charge = 450;
  int eviction_effort_cap = 30;
};
Creating HyperClockCache:
HyperClockCacheOptions hcc_opts;
hcc_opts.capacity = 512 * 1024 * 1024;
hcc_opts.estimated_entry_charge = 0;  // Dynamic sizing (recommended)
auto cache = hcc_opts.MakeSharedCache();
Key advantages:
  • Much improved CPU efficiency under parallel load
  • Lock-free implementation reduces contention
  • Larger cache shards (less risk of thrashing)
Caveats:
  • Only for BlockBasedTableOptions::block_cache
  • Not a general-purpose Cache (fixed-size keys expected)
  • Requires anonymous mmap support (Linux, Windows)

Configuration Options

Capacity

From cache.h:129-132:
size_t capacity;  // Total cache capacity in bytes
Sizing guidelines:
// Conservative: 1/3 of available RAM
size_t capacity = total_ram / 3;

// Aggressive: Up to 60-70% for read-heavy workloads
size_t capacity = total_ram * 0.6;

// Minimum: At least enough for active working set
size_t capacity = working_set_size * 1.5;

Sharding

From cache.h:134-138:
int num_shard_bits = -1;  // Cache is sharded into 2^num_shard_bits shards
Default (-1) chooses a good value based on capacity. LRUCache benefits more from sharding than HyperClockCache.

Strict Capacity Limit

From cache.h:140-145:
bool strict_capacity_limit = false;
  • true: Insert fails if cache is full and no entries can be evicted
  • false: Insert always succeeds, cache may exceed capacity temporarily
LRUCacheOptions opts;
opts.capacity = 512 * 1024 * 1024;
opts.strict_capacity_limit = true;  // Fail inserts when full

Status s = cache->Insert(key, value, helper, charge, &handle);
if (s.IsMemoryLimit()) {
  // Handle cache full condition
}

Priority Levels

From advanced_cache.h:61-68 and cache.h:224-246:
enum class Priority { HIGH, LOW, BOTTOM };

double high_pri_pool_ratio = 0.5;  // Ratio for high-priority entries
double low_pri_pool_ratio = 0.0;   // Ratio for low-priority entries
Priority pools:
  • HIGH: Index and filter blocks (if cache_index_and_filter_blocks_with_high_priority)
  • LOW: Data blocks (default)
  • BOTTOM: Blob values
BlockBasedTableOptions table_options;
table_options.cache_index_and_filter_blocks = true;
table_options.cache_index_and_filter_blocks_with_high_priority = true;

LRUCacheOptions cache_opts;
cache_opts.high_pri_pool_ratio = 0.5;  // 50% reserved for high-pri

Entry Roles

From cache.h:55-88, cache entries are classified by role:
enum class CacheEntryRole {
  kDataBlock,                // Data blocks
  kFilterBlock,              // Filter blocks
  kFilterMetaBlock,          // Partitioned filter metadata
  kIndexBlock,               // Index blocks
  kCompressionDictionaryBuildingBuffer,
  kFilterConstruction,       // Filter construction buffer
  kBlockBasedTableReader,    // Table reader metadata
  kWriteBuffer,              // Memtable charging
  kBlobValue,                // Blob cache entries
  kMisc,                     // Miscellaneous
};

Monitoring by Role

From cache.h:101-110:
struct BlockCacheEntryStatsMapKeys {
  static std::string EntryCount(CacheEntryRole role);
  static std::string UsedBytes(CacheEntryRole role);
  static std::string UsedPercent(CacheEntryRole role);
};

// Usage
std::map<std::string, uint64_t> values;
db->GetMapProperty(DB::Properties::kBlockCacheEntryStats, &values);
uint64_t filter_bytes = values[
    BlockCacheEntryStatsMapKeys::UsedBytes(CacheEntryRole::kFilterBlock)
];

Advanced Features

Secondary Cache

From cache.h:159-161, add a compressed secondary tier:
std::shared_ptr<SecondaryCache> secondary_cache;
Compressed secondary cache:
#include "rocksdb/cache.h"

CompressedSecondaryCacheOptions secondary_opts;
secondary_opts.capacity = 2 * 1024 * 1024 * 1024;  // 2 GB compressed
secondary_opts.compression_type = CompressionType::kLZ4Compression;
auto secondary_cache = secondary_opts.MakeSharedSecondaryCache();

LRUCacheOptions primary_opts;
primary_opts.capacity = 512 * 1024 * 1024;  // 512 MB uncompressed
primary_opts.secondary_cache = secondary_cache;
auto cache = primary_opts.MakeSharedCache();

Tiered Cache

From cache.h:518-547, experimental multi-tier caching:
struct TieredCacheOptions {
  ShardedCacheOptions* cache_opts;  // Primary cache options
  PrimaryCacheType cache_type;      // kCacheTypeLRU or kCacheTypeHCC
  TieredAdmissionPolicy adm_policy; // Admission policy
  CompressedSecondaryCacheOptions comp_cache_opts;
  size_t total_capacity;            // Total budget across tiers
  double compressed_secondary_ratio; // Ratio for compressed tier
  std::shared_ptr<SecondaryCache> nvm_sec_cache;  // NVM tier
};

auto cache = NewTieredCache(tiered_opts);

Metadata Charging

From cache.h:114-124:
enum CacheMetadataChargePolicy {
  kDontChargeCacheMetadata,   // Only entry charge counts
  kFullChargeCacheMetadata    // Include metadata overhead
};
LRUCacheOptions opts;
opts.metadata_charge_policy = kFullChargeCacheMetadata;  // Default
kFullChargeCacheMetadata counts the cache’s internal overhead against capacity for more accurate memory accounting.

Memory Allocator

From cache.h:147-153, use a custom allocator:
std::shared_ptr<MemoryAllocator> memory_allocator;
// Example: jemalloc allocator
LRUCacheOptions opts;
opts.memory_allocator = std::make_shared<JemallocAllocator>();

Usage Patterns

Insert

From advanced_cache.h:225-266:
Status Insert(
    const Slice& key,
    ObjectPtr obj,
    const CacheItemHelper* helper,
    size_t charge,
    Handle** handle = nullptr,
    Priority priority = Priority::LOW,
    const Slice& compressed = Slice(),
    CompressionType type = kNoCompression);

Lookup

From advanced_cache.h:281-295:
Handle* Lookup(
    const Slice& key,
    const CacheItemHelper* helper = nullptr,
    CreateContext* create_context = nullptr,
    Priority priority = Priority::LOW,
    Statistics* stats = nullptr);

// Always release handles when done
if (handle != nullptr) {
  cache->Release(handle);
}

Async Lookup

From advanced_cache.h:456-543, for non-blocking reads:
Cache::AsyncLookupHandle async_handle(key, helper, create_context);
cache->StartAsyncLookup(async_handle);

// Do other work...

Handle* handle = cache->Wait(async_handle);  // Wait for result
if (handle != nullptr) {
  ObjectPtr value = cache->Value(handle);
  cache->Release(handle);
}

Statistics

From statistics.h:32-95, monitor cache performance:
// Cache hits and misses
BLOCK_CACHE_HIT
BLOCK_CACHE_MISS
BLOCK_CACHE_ADD
BLOCK_CACHE_ADD_FAILURES

// By block type
BLOCK_CACHE_INDEX_HIT
BLOCK_CACHE_INDEX_MISS
BLOCK_CACHE_FILTER_HIT
BLOCK_CACHE_FILTER_MISS
BLOCK_CACHE_DATA_HIT
BLOCK_CACHE_DATA_MISS

// Bytes
BLOCK_CACHE_BYTES_READ
BLOCK_CACHE_BYTES_WRITE

// Redundant inserts
BLOCK_CACHE_ADD_REDUNDANT

// Secondary cache
SECONDARY_CACHE_HITS
COMPRESSED_SECONDARY_CACHE_HITS
Example monitoring:
auto stats = options.statistics;

uint64_t hits = stats->getTickerCount(BLOCK_CACHE_HIT);
uint64_t misses = stats->getTickerCount(BLOCK_CACHE_MISS);
uint64_t total = hits + misses;

if (total > 0) {
  double hit_rate = 100.0 * hits / total;
  printf("Cache hit rate: %.2f%%\n", hit_rate);
}

// Get detailed stats by role
std::map<std::string, uint64_t> cache_stats;
db->GetMapProperty(DB::Properties::kBlockCacheEntryStats, &cache_stats);

Cache Management

Runtime Control

From advanced_cache.h:338-356:
// Change capacity dynamically
cache->SetCapacity(new_capacity);

// Enable/disable strict limit
cache->SetStrictCapacityLimit(true);

// Query state
size_t capacity = cache->GetCapacity();
size_t usage = cache->GetUsage();
size_t pinned = cache->GetPinnedUsage();

printf("Cache: %zu / %zu (%.1f%% full, %zu pinned)\n",
       usage, capacity, 100.0 * usage / capacity, pinned);

Eviction Callback

From advanced_cache.h:545-554:
using EvictionCallback =
    std::function<bool(const Slice& key, Handle* h, bool was_hit)>;

cache->SetEvictionCallback(
    [](const Slice& key, Handle* h, bool was_hit) {
      // Custom eviction logic
      // Return true to take ownership, false for normal destruction
      return false;
    }
);

Best Practices

Shared vs. Separate Caches: Use a shared cache across all column families to maximize hit rate and avoid fragmentation.

1. Size Appropriately

// Start with 1/3 of available RAM
size_t total_ram = GetSystemMemory();
size_t cache_size = total_ram / 3;

HyperClockCacheOptions opts;
opts.capacity = cache_size;

2. Use HyperClockCache

// Prefer HyperClockCache for better concurrency
auto cache = HyperClockCacheOptions(cache_size).MakeSharedCache();

3. Cache Index and Filters

BlockBasedTableOptions table_options;
table_options.cache_index_and_filter_blocks = true;
table_options.cache_index_and_filter_blocks_with_high_priority = true;
table_options.pin_l0_filter_and_index_blocks_in_cache = true;

4. Monitor Hit Rate

// Aim for >85% hit rate
auto stats = db->GetDBOptions().statistics;
uint64_t hits = stats->getTickerCount(BLOCK_CACHE_HIT);
uint64_t misses = stats->getTickerCount(BLOCK_CACHE_MISS);
double hit_rate = 100.0 * hits / (hits + misses);

if (hit_rate < 85.0) {
  // Consider increasing cache size
}

5. Use Secondary Cache for Large Datasets

// 512 MB primary + 2 GB compressed secondary
CompressedSecondaryCacheOptions sec_opts;
sec_opts.capacity = 2ULL * 1024 * 1024 * 1024;

HyperClockCacheOptions pri_opts;
pri_opts.capacity = 512 * 1024 * 1024;
pri_opts.secondary_cache = sec_opts.MakeSharedSecondaryCache();

Troubleshooting

Low Hit Rate

  1. Increase cache size: More capacity = better hit rate
  2. Check working set: Cache should hold active working set
  3. Pin critical blocks: Use high priority for index/filters
  4. Monitor entry stats: Identify which block types are missing

High Memory Usage

  1. Enable metadata charging: Use kFullChargeCacheMetadata
  2. Set strict limit: Prevent cache from exceeding capacity
  3. Reduce capacity: Trade hit rate for memory
  4. Use compressed secondary: Offload to compressed tier

Contention (LRU Cache)

  1. Switch to HyperClockCache: Better for high concurrency
  2. Increase sharding: Higher num_shard_bits
  3. Profile mutex wait: Check DB_MUTEX_WAIT_MICROS statistic

See Also

Build docs developers (and LLMs) love