Why Containerization Matters
Containerized orchestrators (Kubernetes, AWS SageMaker, GCP Vertex AI, etc.) run pipeline steps in isolated Docker containers. This ensures:Reproducibility
Same environment across development, staging, and production
Isolation
Steps run in clean environments without conflicting dependencies
Portability
Run anywhere Docker runs - local, cloud, or on-premise
Version Control
Track exact versions of all dependencies and code
Docker Settings
Configure Docker builds usingDockerSettings:
The ContainerizedOrchestrator Base Class
Orchestrators that run steps in containers inherit fromContainerizedOrchestrator:
Image Building Process
ZenML’s image building follows this process:Requirements Installation Order
Dependencies are installed in this specific order:Common Configuration Patterns
Basic Configuration
Simple setup with explicit requirements:Using Requirements File
Custom Base Image
Custom Dockerfile
For complete control:System Dependencies
Environment Variables
Integration Requirements
Step-Specific Docker Settings
Different steps can use different images:Dynamic Pipelines and Docker
For dynamic pipelines, the orchestration container needs its own image:- Runs the pipeline function to discover steps
- Submits discovered steps to the orchestration backend
- Typically needs modest resources (CPU and memory only)
Advanced Build Options
Build Options
Customize Docker build behavior:Package Installers
Choose between pip and uv:Code Inclusion
Control how code is made available:- Download from code repository (if configured and allowed)
- Download from artifact store (if allowed)
- Include in Docker image (if allowed)
Running as Non-Root User
Testing Docker Builds Locally
Test your Docker configuration before running on cloud infrastructure:Best Practices
Pin Versions
Always specify exact versions for reproducibility:
pandas==2.0.0 not pandasMinimize Image Size
Use slim base images and only install necessary packages
Layer Caching
Order Dockerfile commands from least to most frequently changing
Security Scanning
Regularly update base images and scan for vulnerabilities
Security Considerations
Multi-Stage Builds
For complex build requirements:Next Steps
Custom Orchestrators
Build orchestrators that work with Docker images
Resource Configuration
Configure resources for containerized steps
Dynamic Pipelines
Understand orchestration containers for dynamic pipelines
Custom Materializers
Handle data between containerized steps
