Block Cache - RocksDB

Overview

The block cache is RocksDB’s primary in-memory cache for uncompressed data blocks read from SST files. It’s critical for read performance, reducing disk I/O by caching frequently accessed blocks.

A properly sized and configured block cache can improve read throughput by orders of magnitude.

Cache Implementations

From cache.h:46-47, RocksDB provides two main cache implementations:

using BlockCache = Cache;
using RowCache = Cache;  // For row-level caching

LRU Cache

From cache.h:207-275, the traditional LRU (Least Recently Used) cache:

struct LRUCacheOptions : public ShardedCacheOptions {
  size_t capacity = 0;
  int num_shard_bits = -1;
  bool strict_capacity_limit = false;
  double high_pri_pool_ratio = 0.5;
  double low_pri_pool_ratio = 0.0;
  bool use_adaptive_mutex = kDefaultToAdaptiveMutex;
  CacheMetadataChargePolicy metadata_charge_policy =
      kFullChargeCacheMetadata;
};

Creating an LRU cache:

#include "rocksdb/cache.h"

// Simple creation
auto cache = NewLRUCache(512 * 1024 * 1024);  // 512 MB

// Or with options
LRUCacheOptions cache_opts;
cache_opts.capacity = 512 * 1024 * 1024;
cache_opts.num_shard_bits = 6;  // 64 shards
cache_opts.strict_capacity_limit = false;
auto cache = cache_opts.MakeSharedCache();

BlockBasedTableOptions table_options;
table_options.block_cache = cache;

HyperClockCache (Recommended)

From cache.h:371-476, lock-free cache with better CPU efficiency:

HyperClockCache is now generally recommended over LRUCache for better performance under high concurrency.

struct HyperClockCacheOptions : public ShardedCacheOptions {
  size_t estimated_entry_charge = 0;
  size_t min_avg_entry_charge = 450;
  int eviction_effort_cap = 30;
};

Creating HyperClockCache:

HyperClockCacheOptions hcc_opts;
hcc_opts.capacity = 512 * 1024 * 1024;
hcc_opts.estimated_entry_charge = 0;  // Dynamic sizing (recommended)
auto cache = hcc_opts.MakeSharedCache();

Key advantages:

Much improved CPU efficiency under parallel load
Lock-free implementation reduces contention
Larger cache shards (less risk of thrashing)

Caveats:

Only for BlockBasedTableOptions::block_cache
Not a general-purpose Cache (fixed-size keys expected)
Requires anonymous mmap support (Linux, Windows)

Configuration Options

Capacity

From cache.h:129-132:

size_t capacity;  // Total cache capacity in bytes

Sizing guidelines:

// Conservative: 1/3 of available RAM
size_t capacity = total_ram / 3;

// Aggressive: Up to 60-70% for read-heavy workloads
size_t capacity = total_ram * 0.6;

// Minimum: At least enough for active working set
size_t capacity = working_set_size * 1.5;

Sharding

From cache.h:134-138:

int num_shard_bits = -1;  // Cache is sharded into 2^num_shard_bits shards

Default (-1) chooses a good value based on capacity. LRUCache benefits more from sharding than HyperClockCache.

Strict Capacity Limit

From cache.h:140-145:

bool strict_capacity_limit = false;

true: Insert fails if cache is full and no entries can be evicted
false: Insert always succeeds, cache may exceed capacity temporarily

LRUCacheOptions opts;
opts.capacity = 512 * 1024 * 1024;
opts.strict_capacity_limit = true;  // Fail inserts when full

Status s = cache->Insert(key, value, helper, charge, &handle);
if (s.IsMemoryLimit()) {
  // Handle cache full condition
}

Priority Levels

From advanced_cache.h:61-68 and cache.h:224-246:

enum class Priority { HIGH, LOW, BOTTOM };

double high_pri_pool_ratio = 0.5;  // Ratio for high-priority entries
double low_pri_pool_ratio = 0.0;   // Ratio for low-priority entries

Priority pools:

HIGH: Index and filter blocks (if cache_index_and_filter_blocks_with_high_priority)
LOW: Data blocks (default)
BOTTOM: Blob values

BlockBasedTableOptions table_options;
table_options.cache_index_and_filter_blocks = true;
table_options.cache_index_and_filter_blocks_with_high_priority = true;

LRUCacheOptions cache_opts;
cache_opts.high_pri_pool_ratio = 0.5;  // 50% reserved for high-pri

Entry Roles

From cache.h:55-88, cache entries are classified by role:

enum class CacheEntryRole {
  kDataBlock,                // Data blocks
  kFilterBlock,              // Filter blocks
  kFilterMetaBlock,          // Partitioned filter metadata
  kIndexBlock,               // Index blocks
  kCompressionDictionaryBuildingBuffer,
  kFilterConstruction,       // Filter construction buffer
  kBlockBasedTableReader,    // Table reader metadata
  kWriteBuffer,              // Memtable charging
  kBlobValue,                // Blob cache entries
  kMisc,                     // Miscellaneous
};

Monitoring by Role

From cache.h:101-110:

struct BlockCacheEntryStatsMapKeys {
  static std::string EntryCount(CacheEntryRole role);
  static std::string UsedBytes(CacheEntryRole role);
  static std::string UsedPercent(CacheEntryRole role);
};

// Usage
std::map<std::string, uint64_t> values;
db->GetMapProperty(DB::Properties::kBlockCacheEntryStats, &values);
uint64_t filter_bytes = values[
    BlockCacheEntryStatsMapKeys::UsedBytes(CacheEntryRole::kFilterBlock)
];

Advanced Features

Secondary Cache

From cache.h:159-161, add a compressed secondary tier:

std::shared_ptr<SecondaryCache> secondary_cache;

Compressed secondary cache:

#include "rocksdb/cache.h"

CompressedSecondaryCacheOptions secondary_opts;
secondary_opts.capacity = 2 * 1024 * 1024 * 1024;  // 2 GB compressed
secondary_opts.compression_type = CompressionType::kLZ4Compression;
auto secondary_cache = secondary_opts.MakeSharedSecondaryCache();

LRUCacheOptions primary_opts;
primary_opts.capacity = 512 * 1024 * 1024;  // 512 MB uncompressed
primary_opts.secondary_cache = secondary_cache;
auto cache = primary_opts.MakeSharedCache();

Tiered Cache

From cache.h:518-547, experimental multi-tier caching:

struct TieredCacheOptions {
  ShardedCacheOptions* cache_opts;  // Primary cache options
  PrimaryCacheType cache_type;      // kCacheTypeLRU or kCacheTypeHCC
  TieredAdmissionPolicy adm_policy; // Admission policy
  CompressedSecondaryCacheOptions comp_cache_opts;
  size_t total_capacity;            // Total budget across tiers
  double compressed_secondary_ratio; // Ratio for compressed tier
  std::shared_ptr<SecondaryCache> nvm_sec_cache;  // NVM tier
};

auto cache = NewTieredCache(tiered_opts);

Metadata Charging

From cache.h:114-124:

enum CacheMetadataChargePolicy {
  kDontChargeCacheMetadata,   // Only entry charge counts
  kFullChargeCacheMetadata    // Include metadata overhead
};

LRUCacheOptions opts;
opts.metadata_charge_policy = kFullChargeCacheMetadata;  // Default

kFullChargeCacheMetadata counts the cache’s internal overhead against capacity for more accurate memory accounting.

Memory Allocator

From cache.h:147-153, use a custom allocator:

std::shared_ptr<MemoryAllocator> memory_allocator;

// Example: jemalloc allocator
LRUCacheOptions opts;
opts.memory_allocator = std::make_shared<JemallocAllocator>();

Usage Patterns

Insert

From advanced_cache.h:225-266:

Status Insert(
    const Slice& key,
    ObjectPtr obj,
    const CacheItemHelper* helper,
    size_t charge,
    Handle** handle = nullptr,
    Priority priority = Priority::LOW,
    const Slice& compressed = Slice(),
    CompressionType type = kNoCompression);

Lookup

From advanced_cache.h:281-295:

Handle* Lookup(
    const Slice& key,
    const CacheItemHelper* helper = nullptr,
    CreateContext* create_context = nullptr,
    Priority priority = Priority::LOW,
    Statistics* stats = nullptr);

// Always release handles when done
if (handle != nullptr) {
  cache->Release(handle);
}

Async Lookup

From advanced_cache.h:456-543, for non-blocking reads:

Cache::AsyncLookupHandle async_handle(key, helper, create_context);
cache->StartAsyncLookup(async_handle);

// Do other work...

Handle* handle = cache->Wait(async_handle);  // Wait for result
if (handle != nullptr) {
  ObjectPtr value = cache->Value(handle);
  cache->Release(handle);
}

Statistics

From statistics.h:32-95, monitor cache performance:

// Cache hits and misses
BLOCK_CACHE_HIT
BLOCK_CACHE_MISS
BLOCK_CACHE_ADD
BLOCK_CACHE_ADD_FAILURES

// By block type
BLOCK_CACHE_INDEX_HIT
BLOCK_CACHE_INDEX_MISS
BLOCK_CACHE_FILTER_HIT
BLOCK_CACHE_FILTER_MISS
BLOCK_CACHE_DATA_HIT
BLOCK_CACHE_DATA_MISS

// Bytes
BLOCK_CACHE_BYTES_READ
BLOCK_CACHE_BYTES_WRITE

// Redundant inserts
BLOCK_CACHE_ADD_REDUNDANT

// Secondary cache
SECONDARY_CACHE_HITS
COMPRESSED_SECONDARY_CACHE_HITS

Example monitoring:

auto stats = options.statistics;

uint64_t hits = stats->getTickerCount(BLOCK_CACHE_HIT);
uint64_t misses = stats->getTickerCount(BLOCK_CACHE_MISS);
uint64_t total = hits + misses;

if (total > 0) {
  double hit_rate = 100.0 * hits / total;
  printf("Cache hit rate: %.2f%%\n", hit_rate);
}

// Get detailed stats by role
std::map<std::string, uint64_t> cache_stats;
db->GetMapProperty(DB::Properties::kBlockCacheEntryStats, &cache_stats);

Cache Management

Runtime Control

From advanced_cache.h:338-356:

// Change capacity dynamically
cache->SetCapacity(new_capacity);

// Enable/disable strict limit
cache->SetStrictCapacityLimit(true);

// Query state
size_t capacity = cache->GetCapacity();
size_t usage = cache->GetUsage();
size_t pinned = cache->GetPinnedUsage();

printf("Cache: %zu / %zu (%.1f%% full, %zu pinned)\n",
       usage, capacity, 100.0 * usage / capacity, pinned);

Eviction Callback

From advanced_cache.h:545-554:

using EvictionCallback =
    std::function<bool(const Slice& key, Handle* h, bool was_hit)>;

cache->SetEvictionCallback(
    [](const Slice& key, Handle* h, bool was_hit) {
      // Custom eviction logic
      // Return true to take ownership, false for normal destruction
      return false;
    }
);

Best Practices

Shared vs. Separate Caches: Use a shared cache across all column families to maximize hit rate and avoid fragmentation.

1. Size Appropriately

// Start with 1/3 of available RAM
size_t total_ram = GetSystemMemory();
size_t cache_size = total_ram / 3;

HyperClockCacheOptions opts;
opts.capacity = cache_size;

2. Use HyperClockCache

// Prefer HyperClockCache for better concurrency
auto cache = HyperClockCacheOptions(cache_size).MakeSharedCache();

3. Cache Index and Filters

BlockBasedTableOptions table_options;
table_options.cache_index_and_filter_blocks = true;
table_options.cache_index_and_filter_blocks_with_high_priority = true;
table_options.pin_l0_filter_and_index_blocks_in_cache = true;

4. Monitor Hit Rate

// Aim for >85% hit rate
auto stats = db->GetDBOptions().statistics;
uint64_t hits = stats->getTickerCount(BLOCK_CACHE_HIT);
uint64_t misses = stats->getTickerCount(BLOCK_CACHE_MISS);
double hit_rate = 100.0 * hits / (hits + misses);

if (hit_rate < 85.0) {
  // Consider increasing cache size
}

5. Use Secondary Cache for Large Datasets

// 512 MB primary + 2 GB compressed secondary
CompressedSecondaryCacheOptions sec_opts;
sec_opts.capacity = 2ULL * 1024 * 1024 * 1024;

HyperClockCacheOptions pri_opts;
pri_opts.capacity = 512 * 1024 * 1024;
pri_opts.secondary_cache = sec_opts.MakeSharedSecondaryCache();

Troubleshooting

Low Hit Rate

Increase cache size: More capacity = better hit rate
Check working set: Cache should hold active working set
Pin critical blocks: Use high priority for index/filters
Monitor entry stats: Identify which block types are missing

High Memory Usage

Enable metadata charging: Use kFullChargeCacheMetadata
Set strict limit: Prevent cache from exceeding capacity
Reduce capacity: Trade hit rate for memory
Use compressed secondary: Offload to compressed tier

Contention (LRU Cache)

Switch to HyperClockCache: Better for high concurrency
Increase sharding: Higher num_shard_bits
Profile mutex wait: Check DB_MUTEX_WAIT_MICROS statistic

Get Started

Core Concepts

Developer Guide

Advanced Topics

Language Bindings

Tools & Utilities

​Overview

​Cache Implementations

​LRU Cache

​HyperClockCache (Recommended)

​Configuration Options

​Capacity

​Sharding

​Strict Capacity Limit

​Priority Levels

​Entry Roles

​Monitoring by Role

​Advanced Features

​Secondary Cache

​Tiered Cache

​Metadata Charging

​Memory Allocator

​Usage Patterns

​Insert

​Lookup

​Async Lookup

​Statistics

​Cache Management

​Runtime Control

​Eviction Callback

​Best Practices

​1. Size Appropriately

​2. Use HyperClockCache

​3. Cache Index and Filters

​4. Monitor Hit Rate

​5. Use Secondary Cache for Large Datasets

​Troubleshooting

​Low Hit Rate

​High Memory Usage

​Contention (LRU Cache)

​See Also

Build docs developers (and LLMs) love

Overview

Cache Implementations

LRU Cache

HyperClockCache (Recommended)

Configuration Options

Capacity

Sharding

Strict Capacity Limit

Priority Levels

Entry Roles

Monitoring by Role

Advanced Features

Secondary Cache

Tiered Cache

Metadata Charging

Memory Allocator

Usage Patterns

Insert

Lookup

Async Lookup

Statistics

Cache Management

Runtime Control

Eviction Callback

Best Practices

1. Size Appropriately

2. Use HyperClockCache

3. Cache Index and Filters

4. Monitor Hit Rate

5. Use Secondary Cache for Large Datasets

Troubleshooting

Low Hit Rate

High Memory Usage

Contention (LRU Cache)

See Also