Runners Overview

Apache Beam runners execute your pipeline on different processing backends. The runner you choose determines where and how your pipeline executes.

What is a Runner?

A runner translates your Beam pipeline into the API compatible with the backend execution engine. Each runner has different capabilities, performance characteristics, and is optimized for different use cases.

Available Runners

Apache Beam provides several runners for different execution environments:

Local Execution

DirectRunner

Executes pipelines locally on your machine. Ideal for development, testing, and debugging.

PrismRunner

Modern portable local runner authored in Go. Fast startup and excellent for testing.

Distributed Execution

Google Cloud Dataflow

Fully managed service on Google Cloud Platform with autoscaling and optimization.

Apache Flink

Stream and batch processing on Apache Flink clusters.

Apache Spark

Execute pipelines on Apache Spark clusters for batch and streaming.

Choosing a Runner

Consider these factors when selecting a runner:

Development & Testing

DirectRunner: Best for local development and testing
PrismRunner: Fast local testing with portable architecture

Production Workloads

Managed Service

DataflowRunner: Fully managed, no cluster management, automatic optimization

Self-Managed

FlinkRunner: Strong streaming capabilities, exactly-once processing
SparkRunner: Leverage existing Spark infrastructure

Key Considerations

Feature	DirectRunner	PrismRunner	DataflowRunner	FlinkRunner	SparkRunner
Execution	Local	Local	Cloud	Cluster	Cluster
Scaling	Single machine	Single machine	Autoscaling	Manual	Manual
Management	None	None	Fully managed	Self-managed	Self-managed
Streaming	Limited	Yes	Yes	Excellent	Yes
Batch	Yes	Yes	Yes	Yes	Yes
Best For	Testing	Development	Production (GCP)	Streaming apps	Spark users

Setting the Runner

Specify the runner when creating your pipeline options:

Java
Python
Go

PipelineOptions options = PipelineOptionsFactory.create();
options.setRunner(DataflowRunner.class);

Pipeline p = Pipeline.create(options);

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
    '--runner=DataflowRunner',
])

with beam.Pipeline(options=options) as p:
    # Your pipeline code

import (
    "github.com/apache/beam/sdks/v2/go/pkg/beam"
    "github.com/apache/beam/sdks/v2/go/pkg/beam/runners/dataflow"
)

// Runner is specified when executing
dataflow.Execute(context.Background(), p)

Or specify via command line:

# Java
mvn compile exec:java -Dexec.mainClass=com.example.MyPipeline \
  -Dexec.args="--runner=DataflowRunner"

# Python
python my_pipeline.py --runner=DataflowRunner

# Go
go run my_pipeline.go --runner=dataflow

Runner Capabilities

Not all runners support all Beam features. The Beam Capability Matrix documents which features each runner supports.

Common Capabilities

Bounded/Unbounded PCollections: All runners support bounded data, most support unbounded
ParDo: Supported by all runners
GroupByKey: Supported by all runners
Windowing: Support varies by runner
State & Timers: Not supported by DirectRunner, supported by distributed runners

Next Steps

DirectRunner

Get started with local development

DataflowRunner

Deploy to Google Cloud

FlinkRunner

Run on Apache Flink

SparkRunner

Execute on Apache Spark

Getting Started

Core Concepts

SDKs

Runners

I/O Connectors

What is a Runner?

Available Runners

Local Execution

DirectRunner

PrismRunner

Distributed Execution

Google Cloud Dataflow

Apache Flink

Apache Spark

Choosing a Runner

Development & Testing

Production Workloads

Key Considerations

Setting the Runner

Runner Capabilities

Common Capabilities

Next Steps

DirectRunner

DataflowRunner

FlinkRunner

SparkRunner

Build docs developers (and LLMs) love

Getting Started

Core Concepts

SDKs

Runners

I/O Connectors

​What is a Runner?

​Available Runners

​Local Execution

DirectRunner

PrismRunner

​Distributed Execution

Google Cloud Dataflow

Apache Flink

Apache Spark

​Choosing a Runner

​Development & Testing

​Production Workloads

​Key Considerations

​Setting the Runner

​Runner Capabilities

​Common Capabilities

​Next Steps

DirectRunner

DataflowRunner

FlinkRunner

SparkRunner

Build docs developers (and LLMs) love

What is a Runner?

Available Runners

Local Execution

Distributed Execution

Choosing a Runner

Development & Testing

Production Workloads

Key Considerations

Setting the Runner

Runner Capabilities

Common Capabilities

Next Steps