Available state backends
Flink ships with two production-ready state backends:HashMapStateBackend
Stores state as Java objects on the JVM heap. Fast random access; limited by available JVM heap memory.
EmbeddedRocksDBStateBackend
Stores state in a local RocksDB instance on the TaskManager’s disk. Supports state much larger than JVM heap; supports incremental checkpoints.
HashMapStateBackend by default.
HashMapStateBackend
TheHashMapStateBackend holds state as Java objects in hash tables on the JVM heap. All reads and writes are direct in-memory operations, making it the fastest state backend for low-latency access.
Best for:
- Jobs with state that fits comfortably in JVM heap
- Low-latency requirements where serialization overhead must be avoided
- Jobs with many small states spread across a large key space
- State size is bounded by available JVM heap memory
- Large state increases GC pressure and pause times
- Objects are not reusable (state must not be mutated after being handed to Flink)
When using
HashMapStateBackend, set managed memory to zero so that the maximum heap is available for your state.EmbeddedRocksDBStateBackend
TheEmbeddedRocksDBStateBackend stores in-flight data in a RocksDB database in the TaskManager’s local data directories. State is serialized to byte arrays on reads and writes, which adds CPU overhead compared to heap-based access.
Best for:
- Jobs with very large state that exceeds available JVM heap
- Long windows (hours or days)
- All high-availability production setups with large state
- Each read and write requires (de-)serialization, making it slower than
HashMapStateBackend - Maximum key size and value size are each limited to 2 GB (RocksDB JNI limitation)
ListStatethat uses merge operations can silently accumulate values exceeding 2 GB and fail on the next read
ForStStateBackend (experimental)
TheForStStateBackend is based on the ForSt project (an extension of RocksDB) and is designed for disaggregated state management. It can store SST files on remote filesystems such as HDFS or S3, enabling state sizes that exceed local TaskManager disk capacity.
Choosing the right state backend
| Consideration | HashMapStateBackend | EmbeddedRocksDBStateBackend |
|---|---|---|
| State size | Fits in JVM heap | Any size (disk-backed) |
| Read/write latency | Lowest (in-memory) | Higher (serialization + disk I/O) |
| Throughput | Highest | Lower |
| Incremental checkpoints | No | Yes |
| Large windows | Not recommended | Yes |
| Rescale support | Yes | Yes |
HashMapStateBackend when your state fits in memory and you need the lowest latency. Use EmbeddedRocksDBStateBackend when state is large, when you want incremental checkpoints, or when running long-window aggregations.
Configuring the state backend
Cluster-wide default
Per-job configuration
- Java (HashMapStateBackend)
- Java (EmbeddedRocksDB)
- Python
EmbeddedRocksDBStateBackend in your IDE or when not relying on the cluster distribution, add the Maven dependency:
RocksDB incremental checkpoints
Incremental checkpoints record only the state changes since the last completed checkpoint, rather than producing a full snapshot each time. This can dramatically reduce checkpoint size and duration for large state. Enable incremental checkpoints:- config.yaml
- Java
When incremental checkpoints are enabled, the “Checkpointed Data Size” shown in the Web UI represents only the delta for that checkpoint, not the total state size.
- CPU/IOPs bottleneck: incremental is faster (no need to re-build RocksDB tables from scratch)
- Network bandwidth bottleneck: incremental may be slower (must fetch all accumulated deltas)
RocksDB memory management
Flink manages RocksDB memory by default, allocating memory from the TaskManager’s managed memory budget. This ensures RocksDB stays within the memory limits enforced by the container runtime (YARN, Kubernetes, Docker).RocksDB timers
Timers (for event-time windows andProcessFunction callbacks) are stored in RocksDB by default, which scales well for large numbers of timers. For jobs with few timers, storing them on the JVM heap can improve performance:
Advanced RocksDB tuning
For expert-level tuning, implement aRocksDBOptionsFactory to control column family options:
Changelog state backend
The changelog feature reduces checkpoint latency by continuously uploading state changes to durable storage, then only uploading the relevant portion of the changelog at each checkpoint boundary. Enable it:HashMapStateBackend and EmbeddedRocksDBStateBackend.
