What flame graphs show
Flame graphs answer questions like:- Which methods are consuming the most CPU right now?
- How much time is spent in user code vs. Flink framework code vs. serialization?
- What call chain leads to the hot method?
- Which tasks are blocked on I/O or lock acquisition?
Enabling flame graphs
Flame graphs are disabled by default to avoid any sampling overhead on production systems.Generating a flame graph in the Web UI
Select an operator
Click on the operator you want to profile in the job graph. A panel opens on the right side.
Open the Flame Graph tab
Click the Flame Graph tab in the operator detail panel. Flink begins collecting stack samples from all task threads running that operator.
Flame graph types
The Web UI offers three flame graph views, selectable from the drop-down at the top of the pane:- On-CPU
- Off-CPU
- Mixed
Shows only threads in
RUNNABLE or NEW state. This visualises threads that are actively using CPU. Use this to find CPU hotspots—tight loops, expensive computations, serialization overhead.Thread states included: RUNNABLE, NEWSampling process
Flink collects stack traces entirely within the JVM. Only Java-level method calls are visible; native system calls appear at the JVM boundary. By default, flame graphs are constructed at the operator level: all task threads for the selected operator are sampled in parallel and their stack traces are combined. If one parallel subtask is the bottleneck but others are not, the bottleneck may be averaged out. Starting with Flink 1.17, you can drill down to the subtask level:- Select the operator in the job graph
- In the operator detail panel, click on a specific subtask
- The flame graph shows only that subtask’s threads
Interpreting flame graphs
Reading the graph
- X-axis: proportion of time (wider = more time spent in that method across all samples)
- Y-axis: call stack depth (higher = deeper in the call chain)
- Colour: random, used only to visually distinguish adjacent frames (not significant)
- Flat top: a wide, flat top-of-stack frame is the actual hot method where CPU time is spent
- Wide base: a wide base frame appears in many call chains but delegates to narrower children
Common patterns in Flink
| Pattern | Likely cause |
|---|---|
Wide map() or processElement() frames | CPU-intensive user code |
Wide serialization frames (InstantiationUtil, KryoSerializer) | Serialization overhead; consider custom serializers or Avro/Protobuf |
Wide RocksDB frames (RocksIterator.next(), RocksDB.get()) | State access bottleneck; increase managed memory or tune RocksDB |
Wide network frames (PartitionRequestClient, NettyMessage) | Network back-pressure or slow downstream |
Tall off-CPU stacks with Object.wait() or LockSupport.park() | Threads blocked waiting; check for lock contention |
| Wide GC frames in off-CPU graph | Frequent GC pauses; check heap sizing and GC configuration |
Example: diagnosing a serialization bottleneck
If the on-CPU flame graph shows wide frames in Kryo serialization:- Identify which state or output type is being serialized by Kryo
- Register the type with Flink’s type system:
env.registerType(MyClass.class) - Or switch to an explicit Avro or Protobuf serializer for that type
- Regenerate the flame graph to confirm improvement
Configuration
| Setting | Default | Description |
|---|---|---|
rest.flamegraph.enabled | false | Enable flame graph collection |
rest.flamegraph.sample-interval | 50 ms | Interval between stack trace samples |
rest.flamegraph.delay-between-samples | 50 ms | Delay between successive samples |
rest.flamegraph.num-samples | 100 | Number of samples per flame graph refresh |
rest.flamegraph.cleanup-interval | 10 min | How long to keep cached flame graph data |

