Supported Cloud Platforms
AWS
Full-featured support with Batch, S3, Step Functions, and Secrets Manager
Azure
Native integration with Blob Storage and Key Vault
GCP
Support for Cloud Storage and Secret Manager
Cloud-Native Features
Metaflow integrates with cloud services to provide:Compute
- Elastic scaling: Run compute-intensive steps on cloud resources with CPUs and GPUs
- Container support: Use custom Docker images for dependencies and environments
- Distributed computing: Gang-scheduled multi-node parallel processing
Storage
- Object storage: Automatic artifact persistence to S3, Azure Blob Storage, or GCS
- Data tools: Fast parallel data access for large datasets
- Versioning: Immutable data lineage tracked across all runs
Orchestration
- Production workflows: Deploy to AWS Step Functions, Argo Workflows, or Airflow
- Event triggering: React to cloud events and schedule executions
- Reliability: Built-in retry logic and failure recovery
Security
- Secrets management: Integrate with AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager
- IAM integration: Use cloud-native identity and access management
- Encryption: Server-side encryption for data at rest
Quick Start
Choose your cloud platform to get started:Select Your Cloud
Configure Metaflow for your preferred cloud provider:
Architecture
Metaflow’s multi-cloud architecture separates concerns:Layered Design
- API Layer: Consistent Python API across all clouds
- Plugin Layer: Cloud-specific implementations for compute, storage, and secrets
- Infrastructure Layer: Native cloud services (Batch, S3, etc.)
Code Portability
One of Metaflow’s key strengths is code portability. The same flow code runs locally, on AWS, Azure, or GCP with minimal configuration changes:Hybrid Cloud
Metaflow supports hybrid cloud scenarios:- Run development locally, production in cloud
- Split workloads across multiple clouds
- Use different clouds for different steps in the same flow
Best Practices
Choose the Right Cloud Service
Choose the Right Cloud Service
- Use managed compute services (Batch, Kubernetes) for scalability
- Leverage object storage (S3, Blob, GCS) for artifacts
- Use cloud-native secrets managers for credentials
Optimize Costs
Optimize Costs
- Right-size compute resources with
@resourcesdecorator - Use spot instances for fault-tolerant workloads
- Implement data lifecycle policies for storage
Ensure Security
Ensure Security
- Never hardcode credentials—use secrets managers
- Follow least-privilege IAM principles
- Enable encryption for data at rest and in transit
Monitor Performance
Monitor Performance
- Track resource utilization metrics
- Monitor cloud service quotas and limits
- Use Metaflow Cards for visualization
Cloud Provider Comparison
| Feature | AWS | Azure | GCP |
|---|---|---|---|
| Compute | AWS Batch | Kubernetes | Kubernetes |
| Storage | S3 | Blob Storage | Cloud Storage |
| Orchestration | Step Functions | Argo Workflows | Argo Workflows |
| Secrets | Secrets Manager | Key Vault | Secret Manager |
| Container Registry | ECR | ACR | GCR |
| Maturity | ⭐⭐⭐ Full | ⭐⭐ Good | ⭐⭐ Good |
Next Steps
AWS Setup
Configure Metaflow for AWS with Batch and Step Functions
Azure Setup
Set up Azure Blob Storage and Key Vault integration
GCP Setup
Configure GCP Cloud Storage and Secret Manager
Kubernetes
Run Metaflow on any Kubernetes cluster
