Quick Start

This guide will help you create your first Metaflow flow using the enhanced Conda decorator from the Netflix Extensions.

Your First Flow with Conda

Create a simple flow

Create a file called hello_conda.py with a basic flow that uses different pandas versions:

hello_conda.py

from metaflow import FlowSpec, step, conda

class HelloCondaFlow(FlowSpec):
    
    @conda(libraries={"pandas": "1.4.0"}, python=">=3.8,<3.9")
    @step
    def start(self):
        import pandas as pd
        assert pd.__version__ == "1.4.0"
        print("Step 'start': Pandas version is %s" % pd.__version__)
        self.next(self.end)
        
    @conda(libraries={"pandas": "1.5.0"}, python=">=3.8,<3.9")
    @step
    def end(self):
        import pandas as pd
        assert pd.__version__ == "1.5.0"
        print("Step 'end': Pandas version is %s" % pd.__version__)

if __name__ == "__main__":
    HelloCondaFlow()

Each step can have its own isolated environment with different package versions!

Run the flow

Execute the flow with the --environment=conda flag:

python hello_conda.py --environment=conda run

Expected output:

Metaflow executing HelloCondaFlow

Resolving 2 environments ... done in 27 seconds.

Workflow starting (run-id 1)
    Using existing Conda environment (42a4ed94b63f)
    [1/start/12345 (pid 1234)] Task is starting.
    [1/start/12345 (pid 1234)] Step 'start': Pandas version is 1.4.0
    [1/start/12345 (pid 1234)] Task finished successfully.
    
    Using existing Conda environment (3e07a415e776)
    [1/end/12346 (pid 1235)] Task is starting.
    [1/end/12346 (pid 1235)] Step 'end': Pandas version is 1.5.0
    [1/end/12346 (pid 1235)] Task finished successfully.
    
Done!

The first run will resolve and cache environments. Subsequent runs will reuse cached environments and be much faster!

Using PyPI Packages

The Netflix Extensions provide a dedicated @pypi decorator for pure Python package environments:

pypi_example.py

from metaflow import FlowSpec, step, pypi

class PyPiFlow(FlowSpec):
    
    @pypi(packages={"pandas": "1.4.0"}, python=">=3.8,<3.9")
    @step
    def start(self):
        import pandas as pd
        print(f"Using pandas {pd.__version__} from PyPI")
        self.next(self.end)
        
    @pypi(packages={"pandas": "1.5.0"}, python=">=3.8,<3.9")
    @step
    def end(self):
        import pandas as pd
        print(f"Using pandas {pd.__version__} from PyPI")

if __name__ == "__main__":
    PyPiFlow()

python pypi_example.py --environment=conda run

Mixing Conda and PyPI Packages

You can combine Conda packages (for system libraries) with PyPI packages:

mixed_example.py

from metaflow import FlowSpec, step, conda, pypi

class MixedPackagesFlow(FlowSpec):
    
    @conda(libraries={"ffmpeg": ""})
    @pypi(packages={"ffmpeg-python": "0.2.0"})
    @step
    def start(self):
        import ffmpeg
        import subprocess
        
        # Use ffmpeg executable from Conda
        result = subprocess.run(["ffmpeg", "-version"], capture_output=True)
        print("FFmpeg is available!")
        
        # Use ffmpeg-python library from PyPI
        print(f"ffmpeg-python version: {ffmpeg.__version__}")
        
        self.next(self.end)
        
    @step
    def end(self):
        print("Video processing complete!")

if __name__ == "__main__":
    MixedPackagesFlow()

Mixing Conda and PyPI packages uses conda-lock for resolution, which may be slower but provides maximum flexibility.

Flow-Level Decorators

Use @conda_base or @pypi_base to set default dependencies for all steps:

flow_level_example.py

from metaflow import FlowSpec, step, conda_base, conda

@conda_base(libraries={"numpy": "1.21.5"}, python=">=3.8,<3.9")
class BaseDecoratorFlow(FlowSpec):
    
    @step
    def start(self):
        import numpy as np
        # This step inherits numpy 1.21.5 from @conda_base
        print(f"NumPy version: {np.__version__}")
        assert np.__version__ == "1.21.5"
        self.next(self.process)
    
    @conda(libraries={"numpy": "1.21.6"})
    @step
    def process(self):
        import numpy as np
        # This step overrides with numpy 1.21.6
        print(f"NumPy version: {np.__version__}")
        assert np.__version__ == "1.21.6"
        self.next(self.end)
    
    @conda(disabled=True)
    @step
    def end(self):
        # This step runs in your local environment (no Conda)
        print("Running in local environment")

if __name__ == "__main__":
    BaseDecoratorFlow()

@conda_base applies default packages to all steps
Step-level decorators override flow-level settings
Use disabled=True to opt specific steps out of Conda

Using Requirements Files

You can also define environments using traditional requirements.txt or environment.yml files:

numpy==1.21.5
pandas>=1.4.0,<2.0.0
scikit-learn==1.0.2

Resolve and Cache Environments

# Using requirements.txt
metaflow environment resolve --python ">=3.8,<3.9" -r requirements.txt

# Using environment.yml
metaflow environment resolve --python ">=3.8,<3.9" -f environment.yml

Once resolved, these environments are cached and can be reused across flows using named environments.

Named Environments

Create reusable environments with aliases:

Resolve and name an environment

metaflow environment resolve \
  --python ">=3.8,<3.9" \
  --alias my-org/my-team/ml-env:v1 \
  -f environment.yml

Use the named environment in your flow

named_env_example.py

from metaflow import FlowSpec, step, named_env

class NamedEnvFlow(FlowSpec):
    
    @named_env(name="my-org/my-team/ml-env:v1")
    @step
    def start(self):
        import numpy as np
        print(f"Using pre-resolved environment with NumPy {np.__version__}")
        self.next(self.end)
        
    @step
    def end(self):
        print("Done!")

if __name__ == "__main__":
    NamedEnvFlow()

Named environments are perfect for:

Sharing environments across teams
Ensuring consistent environments across flows
Quick environment reuse without re-resolution

Running on Remote Compute

The extension works seamlessly with Metaflow’s remote execution:

# Run on AWS Batch
python hello_conda.py --environment=conda run --with batch

# Run on Kubernetes
python hello_conda.py --environment=conda run --with kubernetes

Environments are resolved locally and automatically hydrated on remote nodes. Packages are downloaded from your configured cloud storage (S3/Azure/GCS).

Inspecting Environments

View detailed information about resolved environments:

# Show environment for a specific step
metaflow environment show --pathspec HelloCondaFlow/1/start

# Show all environments in a flow
python hello_conda.py --environment=conda environment resolve

Example output:

### Environment for step start ###
Environment full hash: 42a4ed94b63f12e1:a3b104c4ce221535
Arch: linux-64
Resolved on: 2024-03-09 10:30:15
Resolved by: alice

User-requested packages:
  conda::pandas==1.4.0
  conda::boto3>=1.14.0
  conda::python>=3.8,<3.9

Conda packages installed:
  pandas==1.4.0
  numpy==1.22.3
  python==3.8.17
  ...

Creating Development Environments

Create local Conda environments for debugging:

metaflow environment create \
  --name my-debug-env \
  --install-notebook \
  --pathspec HelloCondaFlow/1/start

This creates:

A local Conda environment named my-debug-env
A Jupyter kernel with the same name
Access to all step artifacts

Use this for debugging failed runs or exploring artifacts in the exact environment they were created in!

Advanced: Pure PyPI with Conda System Packages

Install system tools via Conda while keeping Python packages pure PyPI:

requirements.txt

--conda-pkg ffmpeg
--conda-pkg git-lfs

ffmpeg-python==0.2.0
transnetv2 @ git+https://github.com/soCzech/TransNetV2.git#main

Git repositories and local packages only work when resolving for the same architecture you’re running on (no cross-platform resolution).

Next Steps

Full Documentation

Explore advanced features and detailed documentation

Debug Extension

Learn about the Jupyter debugging integration

Configuration

Fine-tune performance and behavior

Join Slack

Get help from the Metaflow community

Common Patterns

Data Science Stack

from metaflow import FlowSpec, step, conda_base

@conda_base(
    libraries={
        "numpy": "1.21.5",
        "pandas": "1.4.0",
        "scikit-learn": "1.0.2",
        "matplotlib": "3.5.0"
    },
    python=">=3.8,<3.9"
)
class DataScienceFlow(FlowSpec):
    # All steps inherit the data science stack
    pass

Machine Learning with GPU

@pypi(packages={"torch": "1.12.0"})
@step
def train(self):
    import torch
    print(f"CUDA available: {torch.cuda.is_available()}")
    # Training code here

Different Requirements per Step

class MultiStepFlow(FlowSpec):
    
    @conda(libraries={"pandas": "1.4.0"})
    @step
    def load_data(self):
        # Light environment for data loading
        pass
    
    @conda(libraries={
        "pandas": "1.4.0",
        "scikit-learn": "1.0.2",
        "xgboost": "1.6.0"
    })
    @step
    def train_model(self):
        # Heavy environment for training
        pass

Each step gets exactly what it needs - no more, no less!

Get Started

Conda v2

Debug Extension

Guides

Your First Flow with Conda

Using PyPI Packages

Mixing Conda and PyPI Packages

Flow-Level Decorators

Using Requirements Files

Resolve and Cache Environments

Named Environments

Running on Remote Compute

Inspecting Environments

Creating Development Environments

Advanced: Pure PyPI with Conda System Packages

Next Steps

Full Documentation

Debug Extension

Configuration

Join Slack

Common Patterns

Data Science Stack

Machine Learning with GPU

Different Requirements per Step

Build docs developers (and LLMs) love

Get Started

Conda v2

Debug Extension

Guides

​Your First Flow with Conda

​Using PyPI Packages

​Mixing Conda and PyPI Packages

​Flow-Level Decorators

​Using Requirements Files

​Resolve and Cache Environments

​Named Environments

​Running on Remote Compute

​Inspecting Environments

​Creating Development Environments

​Advanced: Pure PyPI with Conda System Packages

​Next Steps

Full Documentation

Debug Extension

Configuration

Join Slack

​Common Patterns

​Data Science Stack

​Machine Learning with GPU

​Different Requirements per Step

Build docs developers (and LLMs) love

Your First Flow with Conda

Using PyPI Packages

Mixing Conda and PyPI Packages

Flow-Level Decorators

Using Requirements Files

Resolve and Cache Environments

Named Environments

Running on Remote Compute

Inspecting Environments

Creating Development Environments

Advanced: Pure PyPI with Conda System Packages

Next Steps

Common Patterns

Data Science Stack

Machine Learning with GPU

Different Requirements per Step