Skip to main content
Apache Beam runners execute your pipeline on different processing backends. The runner you choose determines where and how your pipeline executes.

What is a Runner?

A runner translates your Beam pipeline into the API compatible with the backend execution engine. Each runner has different capabilities, performance characteristics, and is optimized for different use cases.

Available Runners

Apache Beam provides several runners for different execution environments:

Local Execution

DirectRunner

Executes pipelines locally on your machine. Ideal for development, testing, and debugging.

PrismRunner

Modern portable local runner authored in Go. Fast startup and excellent for testing.

Distributed Execution

Google Cloud Dataflow

Fully managed service on Google Cloud Platform with autoscaling and optimization.

Apache Flink

Stream and batch processing on Apache Flink clusters.

Apache Spark

Execute pipelines on Apache Spark clusters for batch and streaming.

Choosing a Runner

Consider these factors when selecting a runner:

Development & Testing

  • DirectRunner: Best for local development and testing
  • PrismRunner: Fast local testing with portable architecture

Production Workloads

Managed Service
  • DataflowRunner: Fully managed, no cluster management, automatic optimization
Self-Managed
  • FlinkRunner: Strong streaming capabilities, exactly-once processing
  • SparkRunner: Leverage existing Spark infrastructure

Key Considerations

FeatureDirectRunnerPrismRunnerDataflowRunnerFlinkRunnerSparkRunner
ExecutionLocalLocalCloudClusterCluster
ScalingSingle machineSingle machineAutoscalingManualManual
ManagementNoneNoneFully managedSelf-managedSelf-managed
StreamingLimitedYesYesExcellentYes
BatchYesYesYesYesYes
Best ForTestingDevelopmentProduction (GCP)Streaming appsSpark users

Setting the Runner

Specify the runner when creating your pipeline options:
PipelineOptions options = PipelineOptionsFactory.create();
options.setRunner(DataflowRunner.class);

Pipeline p = Pipeline.create(options);
Or specify via command line:
# Java
mvn compile exec:java -Dexec.mainClass=com.example.MyPipeline \
  -Dexec.args="--runner=DataflowRunner"

# Python
python my_pipeline.py --runner=DataflowRunner

# Go
go run my_pipeline.go --runner=dataflow

Runner Capabilities

Not all runners support all Beam features. The Beam Capability Matrix documents which features each runner supports.

Common Capabilities

  • Bounded/Unbounded PCollections: All runners support bounded data, most support unbounded
  • ParDo: Supported by all runners
  • GroupByKey: Supported by all runners
  • Windowing: Support varies by runner
  • State & Timers: Not supported by DirectRunner, supported by distributed runners

Next Steps

DirectRunner

Get started with local development

DataflowRunner

Deploy to Google Cloud

FlinkRunner

Run on Apache Flink

SparkRunner

Execute on Apache Spark

Build docs developers (and LLMs) love