- Adaptive Scheduler: for streaming jobs, adjusts parallelism within a running cluster
- Adaptive Batch Scheduler: for batch jobs, see Adaptive Batch Execution
Adaptive Scheduler
The Adaptive Scheduler monitors available task slots and adjusts job parallelism automatically:- If fewer slots are available than the configured parallelism (e.g., a TaskManager failed), the scheduler reduces parallelism and keeps the job running.
- When new slots become available, the job scales back up to the configured parallelism.
- In Reactive Mode, the configured parallelism is treated as infinity: the job always uses all available resources.
Enabling the Adaptive Scheduler
The Adaptive Scheduler is for streaming jobs only. Batch jobs use the Adaptive Batch Scheduler automatically.
Adaptive Scheduler configuration
Limitations
- Streaming jobs only. Batch jobs use the Adaptive Batch Scheduler.
- No partial failover: when a task fails, the entire job restarts (not just the affected region). This increases recovery time for embarrassingly parallel jobs compared to the default scheduler.
- Each scaling event triggers a job restart, incrementing task attempt counters.
Reactive Mode
Reactive Mode is a special configuration of the Adaptive Scheduler designed for single-job application clusters. In Reactive Mode:- The configured parallelism is ignored; Flink uses all available slots.
- Adding a TaskManager scales up; removing one scales down.
- Job restarts use the latest completed checkpoint — no manual savepoint is needed.
Getting started with Reactive Mode
Reactive Mode configuration
resource-wait-timeoutdefaults to-1(wait forever for resources).resource-stabilization-timeoutdefaults to0(start as soon as enough resources are available).
Recommendations for Reactive Mode
- Set a restart strategy. If no restart strategy is configured and a failure occurs, Reactive Mode fails the job instead of restarting.
- When scaling down, Flink waits for a heartbeat timeout (~50 seconds by default) before redeploying at lower parallelism if the TaskManager process was killed forcibly (SIGKILL). Use SIGTERM to allow a clean shutdown.
- Reduce
heartbeat.timeoutcautiously — setting it too low can cause spurious failures during GC pauses or network hiccups.
Reactive Mode limitations
- Only supported in standalone Application mode (including Docker and standalone Kubernetes Application clusters).
- Not supported with active resource providers (native Kubernetes, YARN).
- Not supported for session clusters.
- Inherits all Adaptive Scheduler limitations.
Externalized Declarative Resource Management
Starting with Flink 1.18, you can dynamically update the parallelism bounds of a running job via the REST API without restarting it:- Session clusters where multiple jobs compete for resources and you need per-job resource control.
- Application clusters on native Kubernetes where you want Flink to scale up/down automatically while retaining explicit parallelism control.
Externalized Declarative Resource Management is an MVP feature. The Flink community actively seeks user feedback via the mailing lists.

