Core Design Principles
Caffeine’s architecture is based on several key principles:- Lock-free reads: Read operations never block, ensuring maximum throughput
- Buffered writes: Updates are batched to minimize lock contention
- Adaptive eviction: The cache adjusts its behavior based on workload patterns
- Zero overhead: Only pay for features you use through code generation
High-Level Architecture
Caffeine is built on top of
ConcurrentHashMap with added page-replacement algorithms for intelligent eviction.- Hash table layer - Fast concurrent access using
ConcurrentHashMap - Policy layer - Eviction and expiration policies
- Buffer layer - Asynchronous processing of cache events
Concurrency Model
Caffeine uses a best-effort bounding approach to maintain eventual consistency:The penalty of applying policy updates is amortized across threads, making the cost only slightly higher than a plain
ConcurrentHashMap operation.Buffer System
Reads and writes are recorded in lock-free buffers that are drained asynchronously:Read Buffer
Read Buffer
- Ring buffer that records cache accesses
- Rejects additions if contended or full
- Drained when full or during write operations
- Ensures minimal impact on read latency
Write Buffer
Write Buffer
- Bounded queue of cache mutations
- Processed in batches to reduce lock contention
- Uses a state machine to schedule draining
- Maintains ordering guarantees for writes
Drain State Machine
The buffer draining process uses a state machine to coordinate work:- IDLE: No work pending
- REQUIRED: Drain needed but not started
- PROCESSING: Currently draining buffers
Entry State Model
Each cache entry has a lifecycle encoded in its key field to avoid additional memory overhead:- Alive
- Retired
- Dead
Entry exists in both the hash table and the eviction policy. This is the normal state for active entries.
Code Generation
Caffeine uses code generation to eliminate the overhead of unused features:The
BoundedLocalCache class is abstract, and concrete implementations are generated at build time based on the feature combinations you configure.Feature Matrix
The code generator creates specialized classes for combinations of:- Maximum size (yes/no)
- Expiration after write (yes/no)
- Expiration after access (yes/no)
- Variable expiration (yes/no)
- Reference types (strong/weak/soft)
- Removal listener (yes/no)
- Only necessary fields are allocated
- Only required code paths are executed
- JIT compiler can aggressively optimize
Memory Layout
Caffeine is optimized for modern CPU cache architectures:Node Structure
Nodes are sized to fit within cache lines (64 bytes) to minimize memory access costs.
Cache Line Optimization
TheFrequencySketch constrains counters to 64-byte blocks (L1 cache line size):
- Improves spatial locality
- Reduces memory access latency
- Leverages L2 spatial prefetcher
- Typical cost is only one memory access
Maintenance Operations
The cache performs periodic maintenance asynchronously:Eviction Processing
Eviction Processing
- Drain read buffer to update access order
- Drain write buffer to process mutations
- Update eviction policy data structures
- Evict entries if over maximum size
- Notify removal listeners
Expiration Processing
Expiration Processing
- Check oldest entries in expiration queues
- Remove expired entries
- Update timer wheel (for variable expiration)
- Schedule next expiration check
Reference Cleanup
Reference Cleanup
- Process reference queue for GC’d entries
- Remove collected keys/values from cache
- Update size and policy state
- Trigger eviction if needed
Performance Characteristics
| Operation | Time Complexity | Notes |
|---|---|---|
| get() | O(1) | Lock-free, hash table lookup |
| put() | O(1) amortized | Includes buffered policy update |
| Eviction | O(1) amortized | Window TinyLFU with constant factors |
| Expiration | O(1) | Queue peek or timer wheel |
| Removal | O(1) amortized | Buffered processing |
All operations have constant time complexity with small constant factors, making Caffeine extremely efficient even under high load.
Thread Safety
Caffeine provides strong thread-safety guarantees:- All cache operations are thread-safe
- Reads never block other operations
- Writes use fine-grained locking
- Removal listeners are called once per entry
- Iterators are weakly consistent
Further Reading
Design Articles
In-depth architectural discussion by the creator
BP-Wrapper Paper
Framework for lock contention-free algorithms
Source Code
Explore the implementation on GitHub
Efficiency Details
Learn about the W-TinyLFU eviction policy