Performance Tuning

Caffeine is designed for high performance out of the box, but understanding configuration options and tuning strategies can help you achieve optimal performance for your specific use case.

Performance Fundamentals

Near-Optimal

Caffeine achieves near-optimal hit rates with W-TinyLFU

Concurrent

Lock-free design for high concurrency

Adaptive

Automatically adapts to workload patterns

Low Overhead

Minimal CPU and memory overhead

Sizing Your Cache

Understanding Cache Size

Proper sizing is the most important performance factor:

import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;

// Entry-based sizing (simple)
Cache<String, User> cache = Caffeine.newBuilder()
    .maximumSize(10_000) // Maximum number of entries
    .build();

// Weight-based sizing (advanced)
Cache<String, byte[]> dataCache = Caffeine.newBuilder()
    .maximumWeight(100_000_000) // 100MB
    .weigher((key, value) -> value.length)
    .build();

Start with monitoring your working set size (number of unique items accessed in a time window). Set cache size to 2-3x this value.

Calculating Optimal Size

public class CacheSizeCalculator {
    
    public static long calculateOptimalSize(
            long heapSize,
            double cachePercentage,
            long avgEntrySize) {
        long availableMemory = (long) (heapSize * cachePercentage);
        return availableMemory / avgEntrySize;
    }
    
    public static void main(String[] args) {
        long heapSize = Runtime.getRuntime().maxMemory();
        long avgEntrySize = 1024; // 1KB per entry
        double cachePercentage = 0.25; // Use 25% of heap
        
        long optimalSize = calculateOptimalSize(
            heapSize, 
            cachePercentage, 
            avgEntrySize
        );
        
        System.out.println("Recommended cache size: " + optimalSize);
    }
}

Dynamic Sizing

public class DynamicCache<K, V> {
    private volatile Cache<K, V> cache;
    private final ScheduledExecutorService scheduler;
    
    public DynamicCache(long initialSize) {
        this.cache = createCache(initialSize);
        this.scheduler = Executors.newScheduledThreadPool(1);
        
        // Adjust size based on hit rate
        scheduler.scheduleAtFixedRate(
            this::adjustSize,
            1, 1, TimeUnit.HOURS
        );
    }
    
    private void adjustSize() {
        CacheStats stats = cache.stats();
        double hitRate = stats.hitRate();
        long currentSize = cache.estimatedSize();
        
        if (hitRate < 0.80 && currentSize < MAX_SIZE) {
            // Increase size if hit rate is low
            long newSize = (long) (currentSize * 1.2);
            rebuildCache(newSize);
        } else if (hitRate > 0.95 && currentSize > MIN_SIZE) {
            // Decrease size if hit rate is very high
            long newSize = (long) (currentSize * 0.8);
            rebuildCache(newSize);
        }
    }
    
    private void rebuildCache(long newSize) {
        Cache<K, V> oldCache = cache;
        Cache<K, V> newCache = createCache(newSize);
        
        // Copy hot entries
        newCache.putAll(oldCache.asMap());
        cache = newCache;
    }
    
    private Cache<K, V> createCache(long size) {
        return Caffeine.newBuilder()
            .maximumSize(size)
            .recordStats()
            .build();
    }
}

Optimizing Expiration

Choosing Expiration Strategy

Expire After Write
Expire After Access
Variable Expiration

// Best for: Time-sensitive data with fixed validity
Cache<String, Price> priceCache = Caffeine.newBuilder()
    .expireAfterWrite(Duration.ofMinutes(5))
    .build();

// Use case: Stock prices that update every 5 minutes

// Best for: Data that becomes irrelevant when not accessed
Cache<String, Session> sessionCache = Caffeine.newBuilder()
    .expireAfterAccess(Duration.ofMinutes(30))
    .build();

// Use case: User sessions that expire after inactivity

// Best for: Different TTLs per entry
Cache<String, Document> docCache = Caffeine.newBuilder()
    .expireAfter(new Expiry<String, Document>() {
        @Override
        public long expireAfterCreate(
                String key, 
                Document doc, 
                long currentTime) {
            return doc.getTtlNanos();
        }
        
        @Override
        public long expireAfterUpdate(
                String key,
                Document doc,
                long currentTime,
                long currentDuration) {
            return doc.getTtlNanos();
        }
        
        @Override
        public long expireAfterRead(
                String key,
                Document doc,
                long currentTime,
                long currentDuration) {
            return currentDuration; // No change on read
        }
    })
    .build();

Refresh vs Expire

// EXPIRATION: Entry removed, next access loads fresh data
LoadingCache<String, Data> expiringCache = Caffeine.newBuilder()
    .expireAfterWrite(Duration.ofMinutes(5))
    .build(key -> loadData(key));

// REFRESH: Stale data returned, background refresh triggered
LoadingCache<String, Data> refreshingCache = Caffeine.newBuilder()
    .refreshAfterWrite(Duration.ofMinutes(5))
    .build(key -> loadData(key));

// BEST: Combine both for optimal performance
LoadingCache<String, Data> optimalCache = Caffeine.newBuilder()
    .expireAfterWrite(Duration.ofMinutes(10))  // Hard TTL
    .refreshAfterWrite(Duration.ofMinutes(5))  // Soft refresh
    .build(key -> loadData(key));

Refresh returns stale data immediately while loading fresh data asynchronously. This provides better latency than expiration.

Optimizing Loading

Bulk Loading Performance

import java.util.concurrent.CompletableFuture;

// SLOW: Individual loads
LoadingCache<String, User> slowCache = Caffeine.newBuilder()
    .build(key -> database.loadUser(key)); // One query per key

Map<String, User> users = slowCache.getAll(userIds); // N queries!

// FAST: Bulk loading
LoadingCache<String, User> fastCache = Caffeine.newBuilder()
    .build(new CacheLoader<String, User>() {
        @Override
        public User load(String key) {
            return database.loadUser(key);
        }
        
        @Override
        public Map<String, User> loadAll(Set<? extends String> keys) {
            // Single query for all keys
            return database.loadUsers(keys);
        }
    });

Map<String, User> users = fastCache.getAll(userIds); // 1 query!

Async Loading for Better Throughput

// Synchronous: Blocks thread during load
LoadingCache<String, User> syncCache = Caffeine.newBuilder()
    .build(key -> database.loadUser(key)); // Blocks thread

// Asynchronous: Non-blocking, better throughput
AsyncLoadingCache<String, User> asyncCache = Caffeine.newBuilder()
    .buildAsync((key, executor) -> 
        CompletableFuture.supplyAsync(
            () -> database.loadUser(key),
            executor
        )
    );

Coalescing Duplicate Loads

Caffeine automatically coalesces concurrent loads for the same key:

LoadingCache<String, ExpensiveData> cache = Caffeine.newBuilder()
    .build(key -> {
        // Even if 100 threads request same key simultaneously,
        // this function executes only once
        return expensiveComputation(key);
    });

// All threads wait for same result
CompletableFuture.allOf(
    CompletableFuture.runAsync(() -> cache.get("key1")),
    CompletableFuture.runAsync(() -> cache.get("key1")),
    CompletableFuture.runAsync(() -> cache.get("key1"))
).join(); // Only one load happens!

Minimizing Overhead

Disable Features You Don’t Need

// Minimal overhead configuration
Cache<String, String> minimalCache = Caffeine.newBuilder()
    .maximumSize(10_000)
    // Don't add: stats, weak keys, soft values, expiration
    // unless you need them
    .build();

// With all features (higher overhead)
Cache<String, String> fullCache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .recordStats()              // Adds overhead
    .weakKeys()                 // Adds overhead
    .softValues()               // Adds overhead
    .expireAfterWrite(Duration.ofMinutes(5))  // Adds overhead
    .build();

Efficient Key and Value Types

// INEFFICIENT: Boxing overhead
Cache<Integer, Integer> boxedCache = Caffeine.newBuilder()
    .maximumSize(100_000)
    .build();

// BETTER: Use primitive-friendly collections internally
public class EfficientIntCache {
    private final Cache<String, IntArray> cache;
    
    public int get(int id) {
        IntArray array = cache.get(
            String.valueOf(id / 1000),
            key -> new IntArray(1000)
        );
        return array.get(id % 1000);
    }
}

// EFFICIENT: Immutable, well-designed keys
record CacheKey(String tenant, String userId) {
    // Good: implements hashCode/equals efficiently
    // Good: immutable
    // Good: no unnecessary fields
}

Optimal Initial Capacity

// Let Caffeine grow gradually (slower startup)
Cache<String, String> defaultCache = Caffeine.newBuilder()
    .maximumSize(100_000)
    .build();

// Pre-size for known capacity (faster)
Cache<String, String> presizedCache = Caffeine.newBuilder()
    .initialCapacity(100_000)  // Avoid resizing
    .maximumSize(100_000)
    .build();

Executor Configuration

Custom Executor for Async Operations

import java.util.concurrent.*;

// Default: Uses ForkJoinPool.commonPool()
AsyncLoadingCache<String, User> defaultAsync = 
    Caffeine.newBuilder().buildAsync(this::loadUser);

// Custom: Dedicated thread pool
Executor customExecutor = new ThreadPoolExecutor(
    10,              // core threads
    50,              // max threads
    60L,             // keepalive
    TimeUnit.SECONDS,
    new LinkedBlockingQueue<>(1000),
    new ThreadPoolExecutor.CallerRunsPolicy()
);

AsyncLoadingCache<String, User> customAsync = Caffeine.newBuilder()
    .executor(customExecutor)
    .buildAsync(this::loadUser);

Using Runnable::run as executor runs operations on the calling thread. Only use for testing or when you have a specific reason.

Scheduler for Proactive Expiration

import com.github.benmanes.caffeine.cache.Scheduler;

// Default: Opportunistic expiration during operations
Cache<String, String> lazyCache = Caffeine.newBuilder()
    .expireAfterWrite(Duration.ofMinutes(5))
    .build();

// With scheduler: Proactive background expiration
Cache<String, String> activeCache = Caffeine.newBuilder()
    .expireAfterWrite(Duration.ofMinutes(5))
    .scheduler(Scheduler.systemScheduler())
    .build();

Monitoring and Tuning

Essential Metrics

import com.github.benmanes.caffeine.cache.stats.CacheStats;

public class CacheMonitor {
    private final Cache<String, ?> cache;
    
    public void logMetrics() {
        CacheStats stats = cache.stats();
        
        System.out.println("Cache Metrics:");
        System.out.println("  Hit Rate: " + 
            String.format("%.2f%%", stats.hitRate() * 100));
        System.out.println("  Miss Rate: " + 
            String.format("%.2f%%", stats.missRate() * 100));
        System.out.println("  Load Count: " + stats.loadCount());
        System.out.println("  Eviction Count: " + stats.evictionCount());
        System.out.println("  Average Load Time: " + 
            stats.averageLoadPenalty() / 1_000_000 + "ms");
        
        // Size metrics
        System.out.println("  Estimated Size: " + cache.estimatedSize());
    }
    
    public boolean needsTuning() {
        CacheStats stats = cache.stats();
        
        // Low hit rate indicates cache too small or bad access pattern
        if (stats.hitRate() < 0.80) {
            System.out.println("WARNING: Low hit rate. Consider increasing size.");
            return true;
        }
        
        // High average load time indicates slow loading
        if (stats.averageLoadPenalty() > 100_000_000) { // 100ms
            System.out.println("WARNING: Slow loads. Optimize loading logic.");
            return true;
        }
        
        return false;
    }
}

Integration with Monitoring Systems

// Micrometer integration
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.binder.cache.CaffeineCacheMetrics;

public class MonitoredCache<K, V> {
    private final Cache<K, V> cache;
    
    public MonitoredCache(MeterRegistry registry, String cacheName) {
        this.cache = Caffeine.newBuilder()
            .maximumSize(10_000)
            .recordStats()
            .build();
        
        // Register with Micrometer
        CaffeineCacheMetrics.monitor(registry, cache, cacheName);
    }
}

// Prometheus/Grafana
public class PrometheusMetrics {
    public void exportMetrics(Cache<?, ?> cache) {
        CacheStats stats = cache.stats();
        
        // Export to Prometheus
        prometheusRegistry.gauge(
            "cache_hit_rate", 
            stats.hitRate()
        );
        prometheusRegistry.counter(
            "cache_evictions_total",
            stats.evictionCount()
        );
    }
}

Common Performance Issues

Low Hit Rate

Symptoms: High miss rate, frequent loadsSolutions:

Increase cache size
Analyze access patterns - are keys truly reused?
Consider warming up cache at startup
Check if data changes too frequently

// Add cache warmup
public void warmUpCache(LoadingCache<String, User> cache) {
    Set<String> hotKeys = getHotKeys(); // Top accessed keys
    cache.getAll(hotKeys);
}

Slow Load Times

Symptoms: High average load penaltySolutions:

Implement bulk loading
Use async loading
Optimize database queries
Add connection pooling
Use refresh instead of expire

// Use refresh for better latency
LoadingCache<String, User> cache = Caffeine.newBuilder()
    .refreshAfterWrite(Duration.ofMinutes(5))
    .build(key -> optimizedLoad(key));

High Memory Usage

Symptoms: OutOfMemoryError, frequent GCSolutions:

Reduce maximum size
Use weigher for better size control
Consider soft/weak references
Profile memory to find large objects

// Better memory control with weigher
Cache<String, byte[]> cache = Caffeine.newBuilder()
    .maximumWeight(100_000_000) // 100MB
    .weigher((key, value) -> value.length + key.length())
    .build();

Excessive Evictions

Symptoms: High eviction count, churnSolutions:

Increase cache size
Review access patterns
Check if expiration is too aggressive
Monitor eviction listeners

// Track eviction causes
Cache<String, String> cache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .evictionListener((key, value, cause) -> {
        metrics.recordEviction(cause);
    })
    .build();

Performance Benchmarking

import org.openjdk.jmh.annotations.*;

@State(Scope.Thread)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public class CacheBenchmark {
    
    private Cache<Integer, String> cache;
    private Random random;
    
    @Setup
    public void setup() {
        cache = Caffeine.newBuilder()
            .maximumSize(1000)
            .build();
        random = new Random();
        
        // Pre-populate
        for (int i = 0; i < 1000; i++) {
            cache.put(i, "value" + i);
        }
    }
    
    @Benchmark
    public String readHeavy() {
        int key = random.nextInt(1000);
        return cache.getIfPresent(key);
    }
    
    @Benchmark
    public void writeHeavy() {
        int key = random.nextInt(1000);
        cache.put(key, "value" + key);
    }
    
    @Benchmark
    public String mixed() {
        int key = random.nextInt(1000);
        if (random.nextInt(10) < 8) {
            return cache.getIfPresent(key);
        } else {
            cache.put(key, "value" + key);
            return null;
        }
    }
}

Best Practices Summary

Start with Monitoring

Enable statistics and monitor hit rate, load time, and evictions before optimizing.

Size Appropriately

Set cache size based on working set size, not total data size. Monitor and adjust.

Optimize Loading

Implement bulk loading and use async operations for better throughput.

Choose Right Expiration

Use refresh for better latency, expire for hard TTLs. Combine both when appropriate.

Minimize Overhead

Only enable features you need. Every feature adds some overhead.

Test Under Load

Benchmark with realistic workloads before deploying to production.

Next Steps

Testing Caches

Learn how to test cache performance

Cache Types

Choose the right cache type for your needs

Get Started

Core Concepts

Guides

Integrations

Advanced

Performance Tuning

Performance Fundamentals

Near-Optimal

Concurrent

Adaptive

Low Overhead

Sizing Your Cache

Understanding Cache Size

Calculating Optimal Size

Dynamic Sizing

Optimizing Expiration

Choosing Expiration Strategy

Refresh vs Expire

Optimizing Loading

Bulk Loading Performance

Async Loading for Better Throughput

Coalescing Duplicate Loads

Minimizing Overhead

Disable Features You Don’t Need

Efficient Key and Value Types

Optimal Initial Capacity

Executor Configuration

Custom Executor for Async Operations

Scheduler for Proactive Expiration

Monitoring and Tuning

Essential Metrics

Integration with Monitoring Systems

Common Performance Issues

Performance Benchmarking

Best Practices Summary

Next Steps

Testing Caches

Cache Types

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Integrations

Advanced

​Performance Fundamentals

Near-Optimal

Concurrent

Adaptive

Low Overhead

​Sizing Your Cache

​Understanding Cache Size

​Calculating Optimal Size

​Dynamic Sizing

​Optimizing Expiration

​Choosing Expiration Strategy

​Refresh vs Expire

​Optimizing Loading

​Bulk Loading Performance

​Async Loading for Better Throughput

​Coalescing Duplicate Loads

​Minimizing Overhead

​Disable Features You Don’t Need

​Efficient Key and Value Types

​Optimal Initial Capacity

​Executor Configuration

​Custom Executor for Async Operations

​Scheduler for Proactive Expiration

​Monitoring and Tuning

​Essential Metrics

​Integration with Monitoring Systems

​Common Performance Issues

​Performance Benchmarking

​Best Practices Summary

​Next Steps

Testing Caches

Cache Types

Build docs developers (and LLMs) love

Performance Fundamentals

Sizing Your Cache

Understanding Cache Size

Calculating Optimal Size

Dynamic Sizing

Optimizing Expiration

Choosing Expiration Strategy

Refresh vs Expire

Optimizing Loading

Bulk Loading Performance

Async Loading for Better Throughput

Coalescing Duplicate Loads

Minimizing Overhead

Disable Features You Don’t Need

Efficient Key and Value Types

Optimal Initial Capacity

Executor Configuration

Custom Executor for Async Operations

Scheduler for Proactive Expiration

Monitoring and Tuning

Essential Metrics

Integration with Monitoring Systems

Common Performance Issues

Performance Benchmarking

Best Practices Summary

Next Steps