Skip to main content
This guide will help you create your first Metaflow flow using the enhanced Conda decorator from the Netflix Extensions.

Your First Flow with Conda

1

Create a simple flow

Create a file called hello_conda.py with a basic flow that uses different pandas versions:
hello_conda.py
from metaflow import FlowSpec, step, conda

class HelloCondaFlow(FlowSpec):
    
    @conda(libraries={"pandas": "1.4.0"}, python=">=3.8,<3.9")
    @step
    def start(self):
        import pandas as pd
        assert pd.__version__ == "1.4.0"
        print("Step 'start': Pandas version is %s" % pd.__version__)
        self.next(self.end)
        
    @conda(libraries={"pandas": "1.5.0"}, python=">=3.8,<3.9")
    @step
    def end(self):
        import pandas as pd
        assert pd.__version__ == "1.5.0"
        print("Step 'end': Pandas version is %s" % pd.__version__)

if __name__ == "__main__":
    HelloCondaFlow()
Each step can have its own isolated environment with different package versions!
2

Run the flow

Execute the flow with the --environment=conda flag:
python hello_conda.py --environment=conda run
Expected output:
Metaflow executing HelloCondaFlow

Resolving 2 environments ... done in 27 seconds.

Workflow starting (run-id 1)
    Using existing Conda environment (42a4ed94b63f)
    [1/start/12345 (pid 1234)] Task is starting.
    [1/start/12345 (pid 1234)] Step 'start': Pandas version is 1.4.0
    [1/start/12345 (pid 1234)] Task finished successfully.
    
    Using existing Conda environment (3e07a415e776)
    [1/end/12346 (pid 1235)] Task is starting.
    [1/end/12346 (pid 1235)] Step 'end': Pandas version is 1.5.0
    [1/end/12346 (pid 1235)] Task finished successfully.
    
Done!
The first run will resolve and cache environments. Subsequent runs will reuse cached environments and be much faster!

Using PyPI Packages

The Netflix Extensions provide a dedicated @pypi decorator for pure Python package environments:
pypi_example.py
from metaflow import FlowSpec, step, pypi

class PyPiFlow(FlowSpec):
    
    @pypi(packages={"pandas": "1.4.0"}, python=">=3.8,<3.9")
    @step
    def start(self):
        import pandas as pd
        print(f"Using pandas {pd.__version__} from PyPI")
        self.next(self.end)
        
    @pypi(packages={"pandas": "1.5.0"}, python=">=3.8,<3.9")
    @step
    def end(self):
        import pandas as pd
        print(f"Using pandas {pd.__version__} from PyPI")

if __name__ == "__main__":
    PyPiFlow()
python pypi_example.py --environment=conda run

Mixing Conda and PyPI Packages

You can combine Conda packages (for system libraries) with PyPI packages:
mixed_example.py
from metaflow import FlowSpec, step, conda, pypi

class MixedPackagesFlow(FlowSpec):
    
    @conda(libraries={"ffmpeg": ""})
    @pypi(packages={"ffmpeg-python": "0.2.0"})
    @step
    def start(self):
        import ffmpeg
        import subprocess
        
        # Use ffmpeg executable from Conda
        result = subprocess.run(["ffmpeg", "-version"], capture_output=True)
        print("FFmpeg is available!")
        
        # Use ffmpeg-python library from PyPI
        print(f"ffmpeg-python version: {ffmpeg.__version__}")
        
        self.next(self.end)
        
    @step
    def end(self):
        print("Video processing complete!")

if __name__ == "__main__":
    MixedPackagesFlow()
Mixing Conda and PyPI packages uses conda-lock for resolution, which may be slower but provides maximum flexibility.

Flow-Level Decorators

Use @conda_base or @pypi_base to set default dependencies for all steps:
flow_level_example.py
from metaflow import FlowSpec, step, conda_base, conda

@conda_base(libraries={"numpy": "1.21.5"}, python=">=3.8,<3.9")
class BaseDecoratorFlow(FlowSpec):
    
    @step
    def start(self):
        import numpy as np
        # This step inherits numpy 1.21.5 from @conda_base
        print(f"NumPy version: {np.__version__}")
        assert np.__version__ == "1.21.5"
        self.next(self.process)
    
    @conda(libraries={"numpy": "1.21.6"})
    @step
    def process(self):
        import numpy as np
        # This step overrides with numpy 1.21.6
        print(f"NumPy version: {np.__version__}")
        assert np.__version__ == "1.21.6"
        self.next(self.end)
    
    @conda(disabled=True)
    @step
    def end(self):
        # This step runs in your local environment (no Conda)
        print("Running in local environment")

if __name__ == "__main__":
    BaseDecoratorFlow()
  • @conda_base applies default packages to all steps
  • Step-level decorators override flow-level settings
  • Use disabled=True to opt specific steps out of Conda

Using Requirements Files

You can also define environments using traditional requirements.txt or environment.yml files:
numpy==1.21.5
pandas>=1.4.0,<2.0.0
scikit-learn==1.0.2

Resolve and Cache Environments

# Using requirements.txt
metaflow environment resolve --python ">=3.8,<3.9" -r requirements.txt

# Using environment.yml
metaflow environment resolve --python ">=3.8,<3.9" -f environment.yml
Once resolved, these environments are cached and can be reused across flows using named environments.

Named Environments

Create reusable environments with aliases:
1

Resolve and name an environment

metaflow environment resolve \
  --python ">=3.8,<3.9" \
  --alias my-org/my-team/ml-env:v1 \
  -f environment.yml
2

Use the named environment in your flow

named_env_example.py
from metaflow import FlowSpec, step, named_env

class NamedEnvFlow(FlowSpec):
    
    @named_env(name="my-org/my-team/ml-env:v1")
    @step
    def start(self):
        import numpy as np
        print(f"Using pre-resolved environment with NumPy {np.__version__}")
        self.next(self.end)
        
    @step
    def end(self):
        print("Done!")

if __name__ == "__main__":
    NamedEnvFlow()
Named environments are perfect for:
  • Sharing environments across teams
  • Ensuring consistent environments across flows
  • Quick environment reuse without re-resolution

Running on Remote Compute

The extension works seamlessly with Metaflow’s remote execution:
# Run on AWS Batch
python hello_conda.py --environment=conda run --with batch

# Run on Kubernetes
python hello_conda.py --environment=conda run --with kubernetes
Environments are resolved locally and automatically hydrated on remote nodes. Packages are downloaded from your configured cloud storage (S3/Azure/GCS).

Inspecting Environments

View detailed information about resolved environments:
# Show environment for a specific step
metaflow environment show --pathspec HelloCondaFlow/1/start

# Show all environments in a flow
python hello_conda.py --environment=conda environment resolve
Example output:
### Environment for step start ###
Environment full hash: 42a4ed94b63f12e1:a3b104c4ce221535
Arch: linux-64
Resolved on: 2024-03-09 10:30:15
Resolved by: alice

User-requested packages:
  conda::pandas==1.4.0
  conda::boto3>=1.14.0
  conda::python>=3.8,<3.9

Conda packages installed:
  pandas==1.4.0
  numpy==1.22.3
  python==3.8.17
  ...

Creating Development Environments

Create local Conda environments for debugging:
metaflow environment create \
  --name my-debug-env \
  --install-notebook \
  --pathspec HelloCondaFlow/1/start
This creates:
  • A local Conda environment named my-debug-env
  • A Jupyter kernel with the same name
  • Access to all step artifacts
Use this for debugging failed runs or exploring artifacts in the exact environment they were created in!

Advanced: Pure PyPI with Conda System Packages

Install system tools via Conda while keeping Python packages pure PyPI:
requirements.txt
--conda-pkg ffmpeg
--conda-pkg git-lfs

ffmpeg-python==0.2.0
transnetv2 @ git+https://github.com/soCzech/TransNetV2.git#main
Git repositories and local packages only work when resolving for the same architecture you’re running on (no cross-platform resolution).

Next Steps

Full Documentation

Explore advanced features and detailed documentation

Debug Extension

Learn about the Jupyter debugging integration

Configuration

Fine-tune performance and behavior

Join Slack

Get help from the Metaflow community

Common Patterns

Data Science Stack

from metaflow import FlowSpec, step, conda_base

@conda_base(
    libraries={
        "numpy": "1.21.5",
        "pandas": "1.4.0",
        "scikit-learn": "1.0.2",
        "matplotlib": "3.5.0"
    },
    python=">=3.8,<3.9"
)
class DataScienceFlow(FlowSpec):
    # All steps inherit the data science stack
    pass

Machine Learning with GPU

@pypi(packages={"torch": "1.12.0"})
@step
def train(self):
    import torch
    print(f"CUDA available: {torch.cuda.is_available()}")
    # Training code here

Different Requirements per Step

class MultiStepFlow(FlowSpec):
    
    @conda(libraries={"pandas": "1.4.0"})
    @step
    def load_data(self):
        # Light environment for data loading
        pass
    
    @conda(libraries={
        "pandas": "1.4.0",
        "scikit-learn": "1.0.2",
        "xgboost": "1.6.0"
    })
    @step
    def train_model(self):
        # Heavy environment for training
        pass
Each step gets exactly what it needs - no more, no less!

Build docs developers (and LLMs) love