Internal Architecture

Caffeine is built on a sophisticated architecture that balances performance, memory efficiency, and correctness. This guide explores the internal design decisions that make Caffeine one of the fastest caching libraries available.

Core Design Principles

Caffeine’s architecture is based on several key principles:

Lock-free reads: Read operations never block, ensuring maximum throughput
Buffered writes: Updates are batched to minimize lock contention
Adaptive eviction: The cache adjusts its behavior based on workload patterns
Zero overhead: Only pay for features you use through code generation

High-Level Architecture

Caffeine is built on top of ConcurrentHashMap with added page-replacement algorithms for intelligent eviction.

The cache consists of three main layers:

Hash table layer - Fast concurrent access using ConcurrentHashMap
Policy layer - Eviction and expiration policies
Buffer layer - Asynchronous processing of cache events

Concurrency Model

Caffeine uses a best-effort bounding approach to maintain eventual consistency:

// Read operations are lock-free
V value = cache.get(key);

// Writes are recorded in buffers
cache.put(key, value);
// The policy is updated asynchronously

The penalty of applying policy updates is amortized across threads, making the cost only slightly higher than a plain ConcurrentHashMap operation.

Buffer System

Reads and writes are recorded in lock-free buffers that are drained asynchronously:

Read Buffer

Ring buffer that records cache accesses
Rejects additions if contended or full
Drained when full or during write operations
Ensures minimal impact on read latency

Write Buffer

Bounded queue of cache mutations
Processed in batches to reduce lock contention
Uses a state machine to schedule draining
Maintains ordering guarantees for writes

Drain State Machine

The buffer draining process uses a state machine to coordinate work:

IDLE → REQUIRED → PROCESSING → IDLE

IDLE: No work pending
REQUIRED: Drain needed but not started
PROCESSING: Currently draining buffers

Entry State Model

Each cache entry has a lifecycle encoded in its key field to avoid additional memory overhead:

Alive
Retired
Dead

Entry exists in both the hash table and the eviction policy. This is the normal state for active entries.

// Entry is alive and accessible
node.key != null && map.containsKey(key)

Entry has been removed from the hash table but is still in the policy data structures, pending cleanup.

// Entry is retired - marked for removal
node.key == RETIRED_STRONG_KEY

Entry has been removed from both the hash table and the policy. Memory can be reclaimed.

// Entry is fully removed
node.key == DEAD_STRONG_KEY

Code Generation

Caffeine uses code generation to eliminate the overhead of unused features:

The BoundedLocalCache class is abstract, and concrete implementations are generated at build time based on the feature combinations you configure.

Feature Matrix

The code generator creates specialized classes for combinations of:

Maximum size (yes/no)
Expiration after write (yes/no)
Expiration after access (yes/no)
Variable expiration (yes/no)
Reference types (strong/weak/soft)
Removal listener (yes/no)

This ensures that:

Only necessary fields are allocated
Only required code paths are executed
JIT compiler can aggressively optimize

// Example generated class name
class BoundedLocalCache$SS extends BoundedLocalCache<K, V> {
  // SS = Strong keys, Strong values
  // Only includes fields for configured features
}

Memory Layout

Caffeine is optimized for modern CPU cache architectures:

Node Structure

public class Node<K, V> {
  K key;              // 8 bytes (reference)
  V value;            // 8 bytes (reference)
  int weight;         // 4 bytes
  int policyWeight;   // 4 bytes
  Node<K, V> prev;    // 8 bytes (reference)
  Node<K, V> next;    // 8 bytes (reference)
  // Additional fields only if needed
}

Nodes are sized to fit within cache lines (64 bytes) to minimize memory access costs.

Cache Line Optimization

The FrequencySketch constrains counters to 64-byte blocks (L1 cache line size):

Improves spatial locality
Reduces memory access latency
Leverages L2 spatial prefetcher
Typical cost is only one memory access

Maintenance Operations

The cache performs periodic maintenance asynchronously:

Eviction Processing

Drain read buffer to update access order
Drain write buffer to process mutations
Update eviction policy data structures
Evict entries if over maximum size
Notify removal listeners

Expiration Processing

Check oldest entries in expiration queues
Remove expired entries
Update timer wheel (for variable expiration)
Schedule next expiration check

Reference Cleanup

Process reference queue for GC’d entries
Remove collected keys/values from cache
Update size and policy state
Trigger eviction if needed

Performance Characteristics

Operation	Time Complexity	Notes
get()	O(1)	Lock-free, hash table lookup
put()	O(1) amortized	Includes buffered policy update
Eviction	O(1) amortized	Window TinyLFU with constant factors
Expiration	O(1)	Queue peek or timer wheel
Removal	O(1) amortized	Buffered processing

All operations have constant time complexity with small constant factors, making Caffeine extremely efficient even under high load.

Thread Safety

Caffeine provides strong thread-safety guarantees:

All cache operations are thread-safe
Reads never block other operations
Writes use fine-grained locking
Removal listeners are called once per entry
Iterators are weakly consistent

Design Articles

In-depth architectural discussion by the creator

BP-Wrapper Paper

Framework for lock contention-free algorithms

Source Code

Explore the implementation on GitHub

Efficiency Details

Learn about the W-TinyLFU eviction policy

Get Started

Core Concepts

Guides

Integrations

Advanced

Internal Architecture

Core Design Principles

High-Level Architecture

Concurrency Model

Buffer System

Drain State Machine

Entry State Model

Code Generation

Feature Matrix

Memory Layout

Node Structure

Cache Line Optimization

Maintenance Operations

Performance Characteristics

Thread Safety

Further Reading

Design Articles

BP-Wrapper Paper

Source Code

Efficiency Details

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Integrations

Advanced

​Core Design Principles

​High-Level Architecture

​Concurrency Model

​Buffer System

​Drain State Machine

​Entry State Model

​Code Generation

​Feature Matrix

​Memory Layout

​Node Structure

​Cache Line Optimization

​Maintenance Operations

​Performance Characteristics

​Thread Safety

​Further Reading

Design Articles

BP-Wrapper Paper

Source Code

Efficiency Details

Build docs developers (and LLMs) love

Core Design Principles

High-Level Architecture

Concurrency Model

Buffer System

Drain State Machine

Entry State Model

Code Generation

Feature Matrix

Memory Layout

Node Structure

Cache Line Optimization

Maintenance Operations

Performance Characteristics

Thread Safety

Further Reading