Deployment Overview

Flink is a versatile framework that supports many different deployment scenarios. This page describes the building blocks of a Flink cluster and the available deployment options.

Reference architecture

Every Flink cluster consists of three core components: a client, a JobManager, and one or more TaskManagers.

The client takes your application code, compiles it into a JobGraph, and submits it to the JobManager.
The JobManager coordinates the distributed execution: it schedules tasks, tracks distributed checkpoints, and reacts to failures.
TaskManagers (also called workers) execute the actual operators — sources, transformations, and sinks.

Deployment modes

Flink supports two modes for submitting jobs, which differ in cluster lifecycle, resource isolation, and where the application’s main() method runs.

Application mode
Session mode

In Application mode, Flink creates a dedicated cluster for a single application. The application’s main() method executes on the JobManager, which means:

User JARs must be available on the classpath of all Flink components (typically in the usrlib/ folder or bundled into the container image).
Dependencies are not shipped via RPC from the client, making submission faster and cheaper.
The cluster shuts down when the application finishes.
Resource isolation is strong: each application has its own JobManager.

# Submit an application-mode job to Kubernetes
./bin/flink run \
    --target kubernetes-application \
    -Dkubernetes.cluster-id=my-app-cluster \
    -Dkubernetes.container.image.ref=my-flink-image \
    local:///opt/flink/usrlib/my-job.jar

High availability is only supported for single-execute() applications in Application mode. If you use executeAsync() to run multiple jobs concurrently and one is cancelled, all jobs stop and the JobManager shuts down.

In Session mode, a single long-running cluster serves multiple submitted jobs. All jobs share the same TaskManagers and compete for resources.

You pay the cluster startup cost once, not per job.
A failing TaskManager affects all jobs running on it.
The JobManager carries more bookkeeping load as the number of concurrent jobs grows.

# Start a standalone session cluster, then submit jobs to it
./bin/start-cluster.sh
./bin/flink run ./examples/streaming/TopSpeedWindowing.jar

Resource providers

The JobManager has different implementations depending on where you deploy Flink. Choose the one that matches your infrastructure.

Standalone

Launch Flink directly as JVM processes. The simplest setup with no external dependencies. Suitable for local testing and on-premises clusters.

YARN

Deploy Flink on Apache Hadoop YARN. Flink dynamically allocates and releases TaskManager containers from YARN’s ResourceManager.

Kubernetes

Native Kubernetes integration. Flink directly talks to the Kubernetes API to spawn and release pods as needed.

External components

The following components are optional but important for production deployments.

Component	Purpose
High Availability Service	Enables JobManager failover. Supported backends: ZooKeeper and Kubernetes.
File storage	Used for checkpointing and savepoints. Flink supports HDFS, S3, GCS, Azure Blob Storage, and local filesystems.
Metrics storage	Flink components emit internal metrics. Supported reporters: Prometheus, Graphite, InfluxDB, and others.
Data sources and sinks	External systems like Apache Kafka, Amazon S3, Elasticsearch, and Apache Cassandra that your jobs read from or write to.

Repeatable resource cleanup

When a job reaches a globally terminal state (finished, cancelled, or failed), Flink cleans up all associated external resources. If cleanup fails, Flink retries according to a configurable retry strategy. If the maximum number of retries is exhausted, the job is left in a dirty state and its artifacts must be cleaned up manually. Restarting the same job (using the same job ID) will restart the cleanup process rather than re-execute the job.

There is a known issue where CompletedCheckpoints that failed to be deleted during normal checkpoint management are not covered by repeatable cleanup. These must be deleted manually. See FLINK-26606 for details.

Explore deployment options

Configuration

Configure Flink using conf/config.yaml, environment variables, and dynamic properties.

CLI

Use the flink command-line interface to submit, monitor, and manage jobs.

High availability

Set up JobManager HA with ZooKeeper or Kubernetes to eliminate single points of failure.

Memory configuration

Understand and tune Flink’s memory model for TaskManagers and JobManagers.

Elastic scaling

Automatically adjust job parallelism at runtime using Reactive Mode and the Adaptive Scheduler.

Security

Secure Flink’s network communication with SSL/TLS and authenticate with Kerberos.

Setup

Resource Providers

High Availability

Memory & Performance

Security

Reference architecture

Deployment modes

Resource providers

Standalone

YARN

Kubernetes

External components

Repeatable resource cleanup

Explore deployment options

Configuration

CLI

High availability

Memory configuration

Elastic scaling

Security

Build docs developers (and LLMs) love

Setup

Resource Providers

High Availability

Memory & Performance

Security

​Reference architecture

​Deployment modes

​Resource providers

Standalone

YARN

Kubernetes

​External components

​Repeatable resource cleanup

​Explore deployment options

Configuration

CLI

High availability

Memory configuration

Elastic scaling

Security

Build docs developers (and LLMs) love

Reference architecture

Deployment modes

Resource providers

External components

Repeatable resource cleanup

Explore deployment options