Apache Flink

Apache Flink is an open-source, distributed stream processing framework built for high-throughput, low-latency data pipelines. It provides unified stream and batch processing on a single runtime, with exactly-once state consistency guarantees and native support for event-time semantics.

Local Installation

Download and run Flink locally in minutes

DataStream Quickstart

Build your first streaming pipeline with the DataStream API

Table API & SQL Quickstart

Query streams and tables with SQL and the Table API

Core Concepts

Understand Flink’s architecture and programming model

Why Apache Flink

Unified Processing

Single runtime for both streaming and batch workloads — no separate systems to manage.

Exactly-Once Guarantees

Built-in fault tolerance with checkpointing ensures exactly-once state consistency even after failures.

Event-Time Processing

Native support for event-time semantics and out-of-order data using the Dataflow Model.

Stateful Computations

Rich state primitives (ValueState, ListState, MapState) backed by pluggable state backends including RocksDB.

High Throughput & Low Latency

Millions of events per second with millisecond latency — designed for demanding production workloads.

SQL & Table API

Declarative SQL and Table API for streaming and batch queries, with full ANSI SQL support.

Choose your API

Flink provides multiple levels of abstraction to suit different use cases:

DataStream API
Table API & SQL
Python (PyFlink)

The DataStream API is Flink’s core API for building complex streaming and batch data pipelines in Java or Scala. It gives you full control over state, time, and fault tolerance.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<String> text = env.socketTextStream("localhost", 9999);

DataStream<Tuple2<String, Integer>> wordCounts = text
    .flatMap((String line, Collector<Tuple2<String, Integer>> out) -> {
        for (String word : line.split("\\s")) {
            out.collect(Tuple2.of(word, 1));
        }
    })
    .returns(Types.TUPLE(Types.STRING, Types.INT))
    .keyBy(t -> t.f0)
    .sum(1);

wordCounts.print();
env.execute("Word Count");

DataStream API Overview

Get started with the DataStream API

The Table API and SQL provide a relational abstraction for both streaming and batch processing, making it easy to express complex queries and join streams with static tables.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

// Register a source table
tableEnv.executeSql(
    "CREATE TABLE orders (" +
    "  order_id BIGINT," +
    "  item STRING," +
    "  amount DECIMAL(10, 2)," +
    "  order_time TIMESTAMP(3)," +
    "  WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND" +
    ") WITH (" +
    "  'connector' = 'kafka'," +
    "  'topic' = 'orders'," +
    "  'properties.bootstrap.servers' = 'localhost:9092'," +
    "  'format' = 'json'" +
    ")"
);

// Run a streaming SQL query
tableEnv.executeSql(
    "SELECT item, SUM(amount) AS total " +
    "FROM orders " +
    "GROUP BY item"
).print();

Table API & SQL Overview

Get started with the Table API and SQL

PyFlink is the Python API for Apache Flink, supporting the DataStream API and Table API / SQL from Python.

from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment

env = StreamExecutionEnvironment.get_execution_environment()
t_env = StreamTableEnvironment.create(env)

t_env.execute_sql("""
    CREATE TABLE orders (
        order_id BIGINT,
        item STRING,
        amount DECIMAL(10, 2),
        order_time TIMESTAMP(3),
        WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
    ) WITH (
        'connector' = 'kafka',
        'topic' = 'orders',
        'properties.bootstrap.servers' = 'localhost:9092',
        'format' = 'json'
    )
""")

t_env.execute_sql(
    "SELECT item, SUM(amount) AS total FROM orders GROUP BY item"
).print()

PyFlink Overview

Get started with PyFlink

Deployment options

Flink runs on a variety of cluster environments:

Standalone

Deploy on any cluster without a resource manager

Kubernetes

Native Kubernetes integration with the Kubernetes Operator

YARN

Run Flink jobs on Apache Hadoop YARN clusters

Key resources

Configuration reference

All configuration options for Flink clusters and jobs

Checkpoints & savepoints

Fault tolerance and operational state management

Metrics & monitoring

Monitor cluster health and job performance

Connectors

Connect Flink to Kafka, filesystems, JDBC, and more

Introduction

Quickstarts

Core Concepts

Local Installation

DataStream Quickstart

Table API & SQL Quickstart

Core Concepts

Why Apache Flink

Unified Processing

Exactly-Once Guarantees

Event-Time Processing

Stateful Computations

High Throughput & Low Latency

SQL & Table API

Choose your API

DataStream API Overview

Table API & SQL Overview

PyFlink Overview

Deployment options

Standalone

Kubernetes

YARN

Key resources

Configuration reference

Checkpoints & savepoints

Metrics & monitoring

Connectors

Build docs developers (and LLMs) love

Introduction

Quickstarts

Core Concepts

Local Installation

DataStream Quickstart

Table API & SQL Quickstart

Core Concepts

​Why Apache Flink

Unified Processing

Exactly-Once Guarantees

Event-Time Processing

Stateful Computations

High Throughput & Low Latency

SQL & Table API

​Choose your API

DataStream API Overview

Table API & SQL Overview

PyFlink Overview

​Deployment options

Standalone

Kubernetes

YARN

​Key resources

Configuration reference

Checkpoints & savepoints

Metrics & monitoring

Connectors

Build docs developers (and LLMs) love

Why Apache Flink

Choose your API

Deployment options

Key resources