What is a Runner?
A runner translates your Beam pipeline into the API compatible with the backend execution engine. Each runner has different capabilities, performance characteristics, and is optimized for different use cases.Available Runners
Apache Beam provides several runners for different execution environments:Local Execution
DirectRunner
Executes pipelines locally on your machine. Ideal for development, testing, and debugging.
PrismRunner
Modern portable local runner authored in Go. Fast startup and excellent for testing.
Distributed Execution
Google Cloud Dataflow
Fully managed service on Google Cloud Platform with autoscaling and optimization.
Apache Flink
Stream and batch processing on Apache Flink clusters.
Apache Spark
Execute pipelines on Apache Spark clusters for batch and streaming.
Choosing a Runner
Consider these factors when selecting a runner:Development & Testing
- DirectRunner: Best for local development and testing
- PrismRunner: Fast local testing with portable architecture
Production Workloads
Managed Service- DataflowRunner: Fully managed, no cluster management, automatic optimization
- FlinkRunner: Strong streaming capabilities, exactly-once processing
- SparkRunner: Leverage existing Spark infrastructure
Key Considerations
| Feature | DirectRunner | PrismRunner | DataflowRunner | FlinkRunner | SparkRunner |
|---|---|---|---|---|---|
| Execution | Local | Local | Cloud | Cluster | Cluster |
| Scaling | Single machine | Single machine | Autoscaling | Manual | Manual |
| Management | None | None | Fully managed | Self-managed | Self-managed |
| Streaming | Limited | Yes | Yes | Excellent | Yes |
| Batch | Yes | Yes | Yes | Yes | Yes |
| Best For | Testing | Development | Production (GCP) | Streaming apps | Spark users |
Setting the Runner
Specify the runner when creating your pipeline options:- Java
- Python
- Go
Runner Capabilities
Not all runners support all Beam features. The Beam Capability Matrix documents which features each runner supports.Common Capabilities
- Bounded/Unbounded PCollections: All runners support bounded data, most support unbounded
- ParDo: Supported by all runners
- GroupByKey: Supported by all runners
- Windowing: Support varies by runner
- State & Timers: Not supported by DirectRunner, supported by distributed runners
Next Steps
DirectRunner
Get started with local development
DataflowRunner
Deploy to Google Cloud
FlinkRunner
Run on Apache Flink
SparkRunner
Execute on Apache Spark