Overview
In Metaflow, any attribute you set onself becomes a data artifact that is automatically persisted and accessible in subsequent steps. Understanding how artifacts work is crucial for building efficient flows.
Basic Artifacts
Creating Artifacts
Any attribute assignment creates an artifact:Artifact Persistence
Artifacts are stored in the datastore:- Saved to the datastore at step completion
- Loaded from the datastore at the next step start
- Cached in memory for fast access
Supported Data Types
Metaflow can serialize most Python objects:Ephemeral Attributes
Some attributes are not persisted:- Internal Metaflow attributes (start with
_) - Properties like
indexandinput(computed per step) - Flow metadata structures
Artifact Size Considerations
Small Artifacts
Small artifacts (< 1MB) are handled efficiently:Large Artifacts
For large data (> 100MB), consider optimization strategies:Using IncludeFile for Data Files
Artifacts in Branches
Artifact Propagation
Artifacts from before a split are available in all branches:Merge Artifacts
Merge artifacts from parallel branches:Conflict Resolution
Handle conflicting artifacts:Artifacts in Foreach
Artifacts Created in Foreach
Each foreach task creates its own artifacts:Accessing Parent Artifacts
Artifacts from before the foreach are available:Internal Artifacts
Metaflow creates some internal artifacts:Artifact Best Practices
1. Be Selective
Only save what you need:2. Use Clear Names
3. Document Complex Artifacts
4. Avoid Serialization Issues
Don’t store objects that can’t be pickled:5. Don’t Serialize the Flow
Never assignself to an artifact:
Artifact Metadata
Metaflow stores metadata about artifacts:_graph_info artifact and includes:
- Flow file name
- Parameter definitions
- Constants
- Step structure
- Decorator information
External Storage Patterns
For very large data, use external storage:S3 Pattern
Custom Storage Pattern
Inspecting Artifacts
You can inspect artifacts from completed runs:Performance Tips
1. Minimize Artifact Size
2. Use Efficient Formats
3. Lazy Loading
Next Steps
FlowSpec
Deep dive into the FlowSpec base class
Branching
Manage artifacts across parallel branches
Foreach
Handle artifacts in dynamic foreach loops
Parameters
Use Parameters and Configs as special artifacts
