Skip to main content
Flink is a versatile framework that supports many different deployment scenarios. This page describes the building blocks of a Flink cluster and the available deployment options.

Reference architecture

Every Flink cluster consists of three core components: a client, a JobManager, and one or more TaskManagers.
  • The client takes your application code, compiles it into a JobGraph, and submits it to the JobManager.
  • The JobManager coordinates the distributed execution: it schedules tasks, tracks distributed checkpoints, and reacts to failures.
  • TaskManagers (also called workers) execute the actual operators — sources, transformations, and sinks.

Deployment modes

Flink supports two modes for submitting jobs, which differ in cluster lifecycle, resource isolation, and where the application’s main() method runs.
In Application mode, Flink creates a dedicated cluster for a single application. The application’s main() method executes on the JobManager, which means:
  • User JARs must be available on the classpath of all Flink components (typically in the usrlib/ folder or bundled into the container image).
  • Dependencies are not shipped via RPC from the client, making submission faster and cheaper.
  • The cluster shuts down when the application finishes.
  • Resource isolation is strong: each application has its own JobManager.
# Submit an application-mode job to Kubernetes
./bin/flink run \
    --target kubernetes-application \
    -Dkubernetes.cluster-id=my-app-cluster \
    -Dkubernetes.container.image.ref=my-flink-image \
    local:///opt/flink/usrlib/my-job.jar
High availability is only supported for single-execute() applications in Application mode. If you use executeAsync() to run multiple jobs concurrently and one is cancelled, all jobs stop and the JobManager shuts down.

Resource providers

The JobManager has different implementations depending on where you deploy Flink. Choose the one that matches your infrastructure.

Standalone

Launch Flink directly as JVM processes. The simplest setup with no external dependencies. Suitable for local testing and on-premises clusters.

YARN

Deploy Flink on Apache Hadoop YARN. Flink dynamically allocates and releases TaskManager containers from YARN’s ResourceManager.

Kubernetes

Native Kubernetes integration. Flink directly talks to the Kubernetes API to spawn and release pods as needed.

External components

The following components are optional but important for production deployments.
ComponentPurpose
High Availability ServiceEnables JobManager failover. Supported backends: ZooKeeper and Kubernetes.
File storageUsed for checkpointing and savepoints. Flink supports HDFS, S3, GCS, Azure Blob Storage, and local filesystems.
Metrics storageFlink components emit internal metrics. Supported reporters: Prometheus, Graphite, InfluxDB, and others.
Data sources and sinksExternal systems like Apache Kafka, Amazon S3, Elasticsearch, and Apache Cassandra that your jobs read from or write to.

Repeatable resource cleanup

When a job reaches a globally terminal state (finished, cancelled, or failed), Flink cleans up all associated external resources. If cleanup fails, Flink retries according to a configurable retry strategy. If the maximum number of retries is exhausted, the job is left in a dirty state and its artifacts must be cleaned up manually. Restarting the same job (using the same job ID) will restart the cleanup process rather than re-execute the job.
There is a known issue where CompletedCheckpoints that failed to be deleted during normal checkpoint management are not covered by repeatable cleanup. These must be deleted manually. See FLINK-26606 for details.

Explore deployment options

Configuration

Configure Flink using conf/config.yaml, environment variables, and dynamic properties.

CLI

Use the flink command-line interface to submit, monitor, and manage jobs.

High availability

Set up JobManager HA with ZooKeeper or Kubernetes to eliminate single points of failure.

Memory configuration

Understand and tune Flink’s memory model for TaskManagers and JobManagers.

Elastic scaling

Automatically adjust job parallelism at runtime using Reactive Mode and the Adaptive Scheduler.

Security

Secure Flink’s network communication with SSL/TLS and authenticate with Kerberos.

Build docs developers (and LLMs) love