Installation
sagemaker>=2.237.3,<3.0.0- SageMaker SDKkubernetes- Kubernetes Python clientaws-profile-manager- AWS profile managementpytz>=2021.1- Timezone support
Available Components
The AWS integration provides these stack components:SageMaker Orchestrator
Execute pipelines using AWS SageMaker Pipelines
SageMaker Step Operator
Run individual steps on SageMaker Training or Processing jobs
ECR Container Registry
Store Docker images in Amazon Elastic Container Registry
AWS Image Builder
Build container images using AWS services
Authentication
There are three ways to authenticate with AWS:1. Service Connector (Recommended)
2. Explicit Credentials
3. Default AWS Configuration
If no credentials are provided, ZenML uses the default AWS configuration from:- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - AWS credentials file (
~/.aws/credentials) - IAM role (when running on EC2/ECS/Lambda)
SageMaker Orchestrator
The SageMaker orchestrator runs your complete pipeline as a SageMaker Pipeline.Configuration
execution_role- IAM role ARN with SageMaker permissions
region- AWS region (defaults to default AWS config)bucket- S3 bucket for artifacts (defaults tosagemaker-{region}-{account-id})scheduler_role- IAM role ARN for scheduled pipelinesaws_profile- AWS profile name from~/.aws/credentials
IAM Permissions
The execution role needs these permissions:Step-Level Settings
Customize individual steps with SageMaker-specific settings:instance_type- EC2 instance type (default:ml.m5.xlargefor training,ml.t3.mediumfor processing)volume_size_in_gb- EBS volume size (default: 30)max_runtime_in_seconds- Maximum execution time (default: 86400 = 24 hours)execution_role- Override orchestrator’s execution roleenvironment- Environment variables for the containertags- AWS tags for the jobsynchronous- Wait for pipeline completion (default: True)
Instance Types
Common SageMaker instance types:| Instance Type | vCPUs | RAM | GPU | Use Case |
|---|---|---|---|---|
| ml.t3.medium | 2 | 4 GB | - | Light preprocessing |
| ml.m5.xlarge | 4 | 16 GB | - | Standard training |
| ml.m5.4xlarge | 16 | 64 GB | - | Large-scale preprocessing |
| ml.p3.2xlarge | 8 | 61 GB | 1x V100 | Deep learning training |
| ml.p3.8xlarge | 32 | 244 GB | 4x V100 | Distributed training |
| ml.g4dn.xlarge | 4 | 16 GB | 1x T4 | Inference/light training |
SageMaker Step Operator
The step operator runs individual steps as SageMaker jobs while orchestrating locally or with another orchestrator.Configuration
Usage
ECR Container Registry
Store Docker images in Amazon Elastic Container Registry.Configuration
{account-id}.dkr.ecr.{region}.amazonaws.com
Usage
ZenML automatically pushes and pulls images from ECR when building Docker images for remote execution:Complete Stack Example
Here’s a complete production-ready AWS stack:Best Practices
Use IAM Roles for EC2/ECS
Use IAM Roles for EC2/ECS
When running ZenML from EC2 or ECS, attach an IAM role to the instance instead of using access keys:
Separate Execution Roles
Separate Execution Roles
Use different execution roles for different environments:
- Development: Limited permissions, small instance types
- Staging: Broader permissions for testing
- Production: Full permissions, all instance types
Use S3 Bucket Policy
Use S3 Bucket Policy
Restrict S3 bucket access to specific roles:
Tag Resources
Tag Resources
Use tags for cost tracking and resource management:
Common Issues
Access Denied Errors
Access Denied Errors
If you see “Access Denied” errors, check:
- IAM role has correct permissions
- Trust relationship allows SageMaker to assume the role
- S3 bucket policy allows access from the role
- ECR repository policy allows image pulls
Instance Type Not Available
Instance Type Not Available
Some instance types are not available in all regions. If you get an error:
- Check instance type availability
- Request quota increase in AWS Service Quotas
- Use a different instance type
Pipeline Execution Timeout
Pipeline Execution Timeout
If pipelines timeout:
- Increase
max_runtime_in_secondsin step settings - Use
synchronous=Falsefor long-running pipelines - Monitor execution in SageMaker console
Next Steps
SageMaker Documentation
Detailed SageMaker orchestrator guide
S3 Artifact Store
Configure S3 for artifact storage
Service Connectors
Advanced authentication options
Remote Execution
Production deployment patterns
