Skip to main content
Flame graphs are a visualisation for stack trace samples that let you immediately see which code paths consume the most CPU time in a running Flink job. Flink natively supports flame graphs in the Web UI since version 1.13.

What flame graphs show

Flame graphs answer questions like:
  • Which methods are consuming the most CPU right now?
  • How much time is spent in user code vs. Flink framework code vs. serialization?
  • What call chain leads to the hot method?
  • Which tasks are blocked on I/O or lock acquisition?
Flame graphs are constructed by sampling thread stack traces repeatedly. Each frame in the stack is represented as a horizontal bar. The width of a bar is proportional to how frequently that frame appeared across all samples—wider bars are hotter.

Enabling flame graphs

Flame graphs are disabled by default to avoid any sampling overhead on production systems.
# config.yaml
rest.flamegraph.enabled: true
Restart the cluster or the relevant component after changing this setting.
Stack trace sampling has a small but nonzero CPU overhead. Enable flame graphs in development and pre-production environments. In production, enable them only during active incident investigation and disable them again afterwards.

Generating a flame graph in the Web UI

1

Open the job graph

Navigate to your running job in the Flink Web UI at http://jobmanager:8081.
2

Select an operator

Click on the operator you want to profile in the job graph. A panel opens on the right side.
3

Open the Flame Graph tab

Click the Flame Graph tab in the operator detail panel. Flink begins collecting stack samples from all task threads running that operator.
4

Wait for samples to accumulate

The flame graph refreshes as new samples are collected. Wait a few seconds for enough samples to produce a meaningful visualisation.

Flame graph types

The Web UI offers three flame graph views, selectable from the drop-down at the top of the pane:
Shows only threads in RUNNABLE or NEW state. This visualises threads that are actively using CPU. Use this to find CPU hotspots—tight loops, expensive computations, serialization overhead.Thread states included: RUNNABLE, NEW

Sampling process

Flink collects stack traces entirely within the JVM. Only Java-level method calls are visible; native system calls appear at the JVM boundary. By default, flame graphs are constructed at the operator level: all task threads for the selected operator are sampled in parallel and their stack traces are combined. If one parallel subtask is the bottleneck but others are not, the bottleneck may be averaged out. Starting with Flink 1.17, you can drill down to the subtask level:
  1. Select the operator in the job graph
  2. In the operator detail panel, click on a specific subtask
  3. The flame graph shows only that subtask’s threads
Use subtask-level flame graphs when you suspect data skew is causing one parallel instance to be much hotter than others.

Interpreting flame graphs

Reading the graph

  • X-axis: proportion of time (wider = more time spent in that method across all samples)
  • Y-axis: call stack depth (higher = deeper in the call chain)
  • Colour: random, used only to visually distinguish adjacent frames (not significant)
  • Flat top: a wide, flat top-of-stack frame is the actual hot method where CPU time is spent
  • Wide base: a wide base frame appears in many call chains but delegates to narrower children
PatternLikely cause
Wide map() or processElement() framesCPU-intensive user code
Wide serialization frames (InstantiationUtil, KryoSerializer)Serialization overhead; consider custom serializers or Avro/Protobuf
Wide RocksDB frames (RocksIterator.next(), RocksDB.get())State access bottleneck; increase managed memory or tune RocksDB
Wide network frames (PartitionRequestClient, NettyMessage)Network back-pressure or slow downstream
Tall off-CPU stacks with Object.wait() or LockSupport.park()Threads blocked waiting; check for lock contention
Wide GC frames in off-CPU graphFrequent GC pauses; check heap sizing and GC configuration

Example: diagnosing a serialization bottleneck

If the on-CPU flame graph shows wide frames in Kryo serialization:
  1. Identify which state or output type is being serialized by Kryo
  2. Register the type with Flink’s type system: env.registerType(MyClass.class)
  3. Or switch to an explicit Avro or Protobuf serializer for that type
  4. Regenerate the flame graph to confirm improvement

Configuration

SettingDefaultDescription
rest.flamegraph.enabledfalseEnable flame graph collection
rest.flamegraph.sample-interval50 msInterval between stack trace samples
rest.flamegraph.delay-between-samples50 msDelay between successive samples
rest.flamegraph.num-samples100Number of samples per flame graph refresh
rest.flamegraph.cleanup-interval10 minHow long to keep cached flame graph data
For finer-grained profiles, reduce rest.flamegraph.sample-interval to 10–20 ms. This increases sampling frequency but also increases overhead.

Build docs developers (and LLMs) love