Orchestrators
An orchestrator is a special kind of backend that manages the running of each step of the pipeline. Orchestrators administer the actual pipeline runs. You can think of it as the ‘root’ of any pipeline job that you run during your experimentation.Overview
The orchestrator is responsible for:- Executing pipeline steps in the correct order based on dependencies
- Managing the execution environment for each step
- Handling failures and retries
- Scheduling pipeline runs (if supported)
- Coordinating distributed execution across multiple workers
Available Orchestrators
Local Orchestrator
The local orchestrator runs pipelines sequentially on your local machine. It’s included out of the box and perfect for development and testing. Configuration:- Local development and debugging
- Quick prototyping
- Small-scale experiments
- CI/CD testing
Local Docker Orchestrator
Runs each pipeline step in a separate Docker container on your local machine. This provides better isolation and reproducibility than the local orchestrator. Configuration:- Docker installed and running locally
- Container registry component in your stack
- Testing containerized pipelines locally
- Ensuring reproducibility across environments
- Debugging container-based workflows
Kubernetes Orchestrator
Executes pipeline steps as Kubernetes pods in a Kubernetes cluster. Installation:- Kubernetes cluster access
- Container registry component
- Configured kubectl context
- Production workloads
- Scalable pipeline execution
- Multi-tenant environments
- Cloud-native deployments
Kubeflow Orchestrator
Uses Kubeflow Pipelines to orchestrate workflows on Kubernetes. Installation:- Native Kubeflow Pipelines UI
- Advanced scheduling capabilities
- Experiment tracking integration
- Kubernetes-native execution
Airflow Orchestrator
Integrates with Apache Airflow to leverage its powerful scheduling and monitoring capabilities. Installation:- Complex scheduling with cron expressions
- Rich monitoring and alerting
- Extensive plugin ecosystem
- Battle-tested at scale
- Scheduled pipeline runs
- Complex workflow dependencies
- Organizations already using Airflow
- Production ML platforms
Cloud Orchestrators
Vertex AI Orchestrator
Google Cloud’s managed ML orchestration service
SageMaker Orchestrator
AWS SageMaker Pipelines for orchestration
Azure ML Orchestrator
Azure Machine Learning pipelines
Databricks Orchestrator
Databricks workflows for orchestration
Choosing an Orchestrator
Consider these factors when selecting an orchestrator:| Factor | Local | Kubernetes | Cloud Services |
|---|---|---|---|
| Setup Complexity | None | Medium | Low-Medium |
| Scalability | Limited | High | High |
| Cost | Free | Infrastructure | Pay-per-use |
| Scheduling | No | Yes | Yes |
| Monitoring | Basic | Good | Excellent |
| Best For | Development | Production (self-hosted) | Production (managed) |
Switching Orchestrators
You can easily switch orchestrators by creating a new stack:Static vs Dynamic Pipelines
Orchestrators handle two types of pipelines: Static Pipelines: The execution graph is known before the pipeline starts. All steps and their dependencies are defined upfront. Dynamic Pipelines: The execution graph can change during runtime based on step outputs. These require orchestrators that support dynamic DAG generation. Most orchestrators support static pipelines. Dynamic pipeline support varies by orchestrator.Resource Configuration
You can configure compute resources for pipeline steps:Scheduling Pipelines
Orchestrators that support scheduling allow you to run pipelines on a schedule:Custom Orchestrators
You can build custom orchestrators by extending theBaseOrchestrator class:
Next Steps
Artifact Stores
Configure storage for pipeline artifacts
Container Registries
Set up container image storage
