STREAMING and BATCH. A third mode, AUTOMATIC, lets Flink choose based on whether the job’s sources are bounded.
When to use each mode
STREAMING is the default and the only mode for unbounded jobs. It processes records as soon as they arrive, maintains running state, and uses watermarks to reason about event time. Use it for continuous pipelines that run indefinitely. BATCH is an optimization for bounded jobs — jobs where you know all input up front. It applies strategies borrowed from traditional batch frameworks: sequential task scheduling, blocking network shuffles, and sort-based state management. The final output of a BATCH job equals what STREAMING would produce, but resource usage and failure recovery are more efficient.You cannot use BATCH mode with unbounded sources. If any source is unbounded, the job must run in STREAMING mode.
Configuring execution mode
You can set the mode via the command line or in code. The command-line approach is preferred because it keeps application code configuration-free and lets you reuse the same JAR in both modes.- Command line
- Programmatic
Behavioral differences
Task scheduling and network shuffle
In STREAMING mode, all tasks must be online simultaneously. Records flow directly from upstream tasks to downstream tasks through pipelined (in-memory) channels. This enables low-latency processing of continuous data. In BATCH mode, Flink breaks the job into stages separated by shuffle boundaries (such askeyBy() or rebalance()). Each stage runs completely before the next begins. Intermediate results are materialized to non-ephemeral storage so they can be read after upstream tasks finish.
For a job structured like this:
| Stage | Operators |
|---|---|
| 1 | source, map1, map2 |
| 2 | map3 |
| 3 | map4, sink |
rebalance(), stages 2 and 3 by keyBy(). Stage 1 completes before Stage 2 starts.
State management
In STREAMING mode, state is stored in the configured state backend (HashMapStateBackend or EmbeddableRocksDBStateBackend). Checkpoints persist state to durable storage. In BATCH mode, the state backend configuration is ignored. Flink groups records by key using an external sort and processes all records for a single key before moving to the next. Only one key’s state is live in memory at a time, which dramatically reduces memory requirements for large keyed state.Order of processing
STREAMING mode makes no ordering guarantees. Records are processed as they arrive. BATCH mode guarantees specific ordering when mixing input types in operators with multiple inputs:- Broadcast inputs are processed first.
- Regular (non-keyed) inputs are processed second.
- Keyed inputs are processed last, with all records for a single key fully processed before moving to the next key.
Event time and watermarks
In STREAMING mode, Flink uses watermarks as a heuristic for event-time progress. Because events arrive out-of-order, the system cannot know when it has seen all events for a given time period. In BATCH mode, the entire input is available upfront. Flink treats this as “perfect watermarks” — it processes records in event-time order and fires all timers at the end of input.WatermarkAssigner and WatermarkGenerator implementations are ignored, but the TimestampAssigner portion of a WatermarkStrategy still runs to assign timestamps to records.
Processing time
In STREAMING mode, processing time is wall-clock time, and processing-time timers fire at the wall-clock time they were scheduled for. In BATCH mode, processing time does not advance during job execution. All processing-time timers fire at the end of input, as if time was fast-forwarded to infinity once all records have been processed.Failure recovery
In STREAMING mode, Flink restores from the most recent checkpoint. All running tasks restart from that checkpoint’s state. In BATCH mode, checkpointing is disabled. Instead, Flink uses the materialized intermediate results as recovery points. Only the failed stage and its predecessors are restarted, not the entire job. This is more efficient than checkpoint-based recovery for bounded workloads.Important limitations in BATCH mode
If you need a transactional sink in BATCH mode, use a sink that implements the Unified Sink API (FLIP-143) rather than a legacySinkFunction with two-phase commit.
Example: reading a bounded file in BATCH mode
BatchWordCount.java
Writing custom operators for BATCH mode
Custom operators must not assume that watermarks are monotonically increasing across keys. In BATCH mode, the watermark resets toMIN_VALUE between keys. Do not cache the last seen watermark in an operator field and assume it will only grow.
Timers fire in key order first, then in timestamp order within each key. Operations that manually change the current key are not supported in BATCH mode.
