Artifact Stores
In ZenML, the inputs and outputs which go through any step are treated as artifacts. An Artifact Store is where these artifacts get stored. Every ZenML stack requires an artifact store component.Overview
The artifact store is responsible for:- Persisting step outputs and pipeline artifacts
- Loading step inputs from previous executions
- Providing versioned artifact storage
- Enabling artifact sharing across pipeline runs
- Supporting data lineage and provenance tracking
How Artifacts Work
When a pipeline step produces output, ZenML:- Serializes the output using a materializer
- Stores the serialized data in the artifact store
- Records metadata about the artifact in the metadata store
- Makes the artifact available to downstream steps
Available Artifact Stores
Local Artifact Store
Stores artifacts on your local file system. Included out of the box - no installation required. Configuration:~/.config/zenml/local_stores/<uuid>
Use cases:
- Local development and testing
- Single-machine workflows
- Quick prototyping
- CI/CD pipelines on single runners
- Not accessible from remote orchestrators
- Limited to single machine
- No built-in versioning or redundancy
S3 Artifact Store
Stores artifacts in Amazon S3 buckets. Installation:- AWS credentials file (
~/.aws/credentials) - Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - IAM roles (when running on AWS infrastructure)
- ZenML service connectors
- Production AWS deployments
- Scalable artifact storage
- Multi-region access
- Integration with other AWS services
- Automatic versioning
- Encryption at rest
- Access control via IAM
- Lifecycle policies for cost optimization
GCS Artifact Store
Stores artifacts in Google Cloud Storage buckets. Installation:- Service account key file
- Application default credentials
- Environment variable (
GOOGLE_APPLICATION_CREDENTIALS) - ZenML service connectors
- Production GCP deployments
- Integration with Vertex AI
- Multi-region redundancy
- Google Cloud ecosystem integration
- Object versioning
- Fine-grained access control
- Strong consistency
- Nearline/Coldline storage classes
Azure Blob Storage Artifact Store
Stores artifacts in Azure Blob Storage. Installation:- Connection string
- Account key
- Azure AD credentials
- ZenML service connectors
- Azure-based ML infrastructure
- Integration with Azure ML
- Enterprise Azure deployments
- Compliance requirements for Azure
Choosing an Artifact Store
| Factor | Local | S3 | GCS | Azure |
|---|---|---|---|---|
| Setup | None | Easy | Easy | Easy |
| Cost | Free | Pay-per-use | Pay-per-use | Pay-per-use |
| Scalability | Limited | Unlimited | Unlimited | Unlimited |
| Remote Access | No | Yes | Yes | Yes |
| Encryption | No | Yes | Yes | Yes |
| Best For | Development | AWS infra | GCP infra | Azure infra |
Working with Artifacts
Accessing Artifacts
You can access artifacts from any pipeline run:Artifact Lineage
ZenML automatically tracks artifact lineage:Artifact Storage Path
Artifacts are stored with a structured path:Migration Between Artifact Stores
To migrate artifacts between stores:- Create new stack with different artifact store:
- Re-run pipelines or copy artifacts manually:
- Option A: Re-run pipelines with the new stack
- Option B: Use cloud storage transfer tools (aws s3 sync, gsutil rsync, etc.)
Artifact Store Authentication
Using Service Connectors
ZenML service connectors provide secure, centralized authentication:- Centralized credential management
- Automatic credential rotation
- Fine-grained access control
- Audit logging
Direct Authentication
For local development, you can rely on cloud provider CLIs:Performance Considerations
Large Artifacts
For large artifacts (models, datasets):- Use cloud artifact stores (S3, GCS, Azure) instead of local
- Enable multipart uploads for files >5GB
- Consider artifact compression
- Use appropriate storage classes for infrequent access
Access Patterns
Optimize based on access patterns:- Frequent access: Standard storage tier
- Infrequent access: Nearline/Infrequent Access tier
- Archival: Coldline/Archive tier
Network Transfer
Minimize network transfer costs:- Co-locate artifact store in same region as orchestrator
- Use regional endpoints when available
- Consider caching for frequently accessed artifacts
Custom Artifact Stores
You can implement custom artifact stores by extendingBaseArtifactStore:
Next Steps
Orchestrators
Configure pipeline orchestration
Container Registries
Set up container image storage
