Deployment Platforms
Metaflow supports multiple orchestration platforms for production deployments:AWS Step Functions
Deploy to AWS Step Functions for serverless orchestration
Argo Workflows
Deploy to Kubernetes-native Argo Workflows
Apache Airflow
Generate Airflow DAGs from Metaflow flows
Key Concepts
Production Tokens
Production tokens are used to organize and secure production deployments. Each deployment is associated with a production token that:- Creates a unique namespace for the flow’s runs
- Controls access to redeploy or modify the flow
- Ensures multiple deployments don’t conflict
Namespaces
Production runs are organized in namespaces with the formatproduction:<token>. To analyze production results in notebooks:
Project Branches
When using the@project decorator, Metaflow automatically manages namespaces for different branches:
Deployment Workflow
A typical production deployment workflow:-
Develop and test locally
-
Deploy to staging
-
Test the production deployment
-
Deploy to production
-
Monitor execution
Best Practices
Use version control
Use version control
Always commit your flow code to version control before deploying. This ensures reproducibility and enables rollbacks if needed.
Test with small datasets first
Test with small datasets first
Before deploying to production, test your flow with a representative subset of data to catch potential issues early.
Set resource limits
Set resource limits
Use decorators like
@resources to set appropriate CPU and memory limits for production workloads.Handle failures gracefully
Handle failures gracefully
Use
@retry and @catch decorators to handle transient failures and make your flows more resilient.Tag your deployments
Tag your deployments
Use
--tag to annotate deployments with version numbers or commit hashes for easier tracking.Configuration
Production deployments require proper configuration of:- Datastore: S3, Azure Blob Storage, or Google Cloud Storage for artifact storage
- Metadata service: For tracking run metadata and lineage
- Compute environment: AWS Batch, Kubernetes, or other compute platforms
Next Steps
Scheduling Flows
Learn how to schedule flows with cron expressions
Event Triggering
Trigger flows based on events and upstream dependencies
Monitoring
Monitor and debug production flows
Orchestrators
Deep dive into specific orchestrator platforms
