Skip to main content
The @pypi decorator allows you to specify PyPI packages required for a specific step in your Metaflow flow. This decorator augments any attributes set in the flow-level @pypi_base decorator.

Overview

Use @pypi to declare step-specific PyPI dependencies. The decorator resolves packages from PyPI (pypi.org by default) and creates an isolated environment for step execution.
from metaflow import FlowSpec, step, pypi

class MyFlow(FlowSpec):
    
    @pypi(packages={"requests": "2.28.0", "beautifulsoup4": "4.11.1"})
    @step
    def start(self):
        import requests
        from bs4 import BeautifulSoup
        response = requests.get("https://example.com")
        print(response.status_code)
        self.next(self.end)
    
    @step
    def end(self):
        print("Done!")

if __name__ == "__main__":
    MyFlow()
Run with:
python myflow.py --environment=conda run

When to Use

  • Pure PyPI packages: When you need packages only available on PyPI
  • Step-specific dependencies: Different steps need different PyPI packages
  • Version overrides: Override flow-level versions from @pypi_base
  • Mixed with Conda: Combine with @conda for hybrid environments
  • Faster resolution: PyPI-only environments resolve faster than mixed Conda/PyPI

Parameters

packages
Dict[str, str]
default:"{}"
Dictionary of PyPI packages to include. Keys are package names, values are version constraints.Version constraints use PEP 440 syntax:
  • Simple pinned versions: "2.28.0"
  • Range constraints: ">=2.0,<3.0"
  • Compatible release: "~=2.28.0" (equivalent to >=2.28.0,<2.29.0)
  • Minimum version: ">=2.0"
  • Empty string for latest: ""
@pypi(packages={
    "requests": "2.28.0",              # Exact version
    "django": ">=3.2,<4.0",            # Range
    "numpy": "~=1.21.0",               # Compatible release
    "pandas": "",                      # Latest available
})
extra_indices
List[str]
default:"[]"
Additional PyPI package indices to search. By default, only pypi.org is used.
@pypi(
    packages={"my-internal-pkg": "1.0.0"},
    extra_indices=["https://pypi.company.com/simple"]
)
python
str
default:"None"
Python version constraint for the environment. If not specified, the current Python version is used.Can be a specific version or a constraint expression:
  • "3.8.12" - Exact version
  • ">=3.8,<3.9" - Range constraint
  • "<3.11" - Upper bound only
@pypi(packages={"django": "4.0"}, python=">=3.8,<3.9")
disabled
bool
default:"False"
If set to True, disables the Conda environment for this step and uses the external environment instead.
@pypi(disabled=True)
@step
def use_system_env(self):
    # Runs in system Python, not managed environment
    pass

Deprecated Parameters

sources
List[str]
default:"[]"
DEPRECATED - Use extra_indices instead.Additional PyPI package sources. The extra_indices parameter supersedes this.
name
str
default:"None"
DEPRECATED - Use @named_env(name=...) instead.Reference to a named environment. Superseded by @named_env decorator.
pathspec
str
default:"None"
DEPRECATED - Use @named_env(pathspec=...) instead.Reference to a pathspec of an existing step. Use @named_env decorator instead.
fetch_at_exec
bool
default:"False"
DEPRECATED - Use @named_env(fetch_at_exec=...) instead.Fetch environment at execution time. Use @named_env decorator instead.

Usage Examples

Basic PyPI Dependencies

Specify simple PyPI package requirements:
from metaflow import FlowSpec, step, pypi

class WebScraperFlow(FlowSpec):
    
    @pypi(packages={
        "requests": "2.28.0",
        "beautifulsoup4": "4.11.1"
    })
    @step
    def fetch_data(self):
        import requests
        from bs4 import BeautifulSoup
        
        response = requests.get("https://api.example.com/data")
        data = response.json()
        self.data = data
        self.next(self.end)
    
    @step
    def end(self):
        print(f"Fetched {len(self.data)} items")

if __name__ == "__main__":
    WebScraperFlow()

Version Constraints

Use various version constraint formats:
@pypi(packages={
    "django": ">=3.2,<4.0",        # Range
    "celery": "~=5.2.0",           # Compatible release
    "redis": ">=4.0",              # Minimum version
    "requests": "2.28.0",          # Exact version
    "boto3": ""                    # Latest
})
@step
def process(self):
    import django
    import celery
    # All packages available
    self.next(self.end)

Different Packages Across Steps

Use different PyPI packages in different steps:
class DataPipeline(FlowSpec):
    
    @pypi(packages={"requests": "2.28.0"})
    @step
    def fetch(self):
        import requests
        self.data = requests.get("https://api.example.com").json()
        self.next(self.process)
    
    @pypi(packages={"polars": "0.17.0"})
    @step
    def process(self):
        import polars as pl
        df = pl.DataFrame(self.data)
        self.processed = df.to_dict()
        self.next(self.visualize)
    
    @pypi(packages={"plotly": "5.14.0"})
    @step
    def visualize(self):
        import plotly.express as px
        # Create visualizations
        self.next(self.end)
    
    @step
    def end(self):
        pass

Custom Package Index

Use private or custom PyPI indices:
@pypi(
    packages={
        "my-company-lib": "2.0.0",
        "requests": "2.28.0"
    },
    extra_indices=[
        "https://pypi.company.com/simple",
        "https://artifacts.internal.net/pypi/simple"
    ]
)
@step
def use_internal_packages(self):
    import my_company_lib
    import requests
    # Both internal and public packages
    self.next(self.end)

Combining with Flow-Level Decorator

Override or extend @pypi_base dependencies:
from metaflow import FlowSpec, step, pypi, pypi_base

@pypi_base(packages={"requests": "2.27.0", "boto3": "1.26.0"})
class MyFlow(FlowSpec):
    
    @step
    def use_base(self):
        import requests
        import boto3
        # requests 2.27.0, boto3 1.26.0 from base
        self.next(self.use_override)
    
    @pypi(packages={"requests": "2.28.0", "django": "4.0"})
    @step
    def use_override(self):
        import requests
        import boto3
        import django
        # requests 2.28.0 (overridden)
        # boto3 1.26.0 (inherited)
        # django 4.0 (added)
        self.next(self.end)
    
    @step
    def end(self):
        pass

Mixing PyPI and Conda

Combine PyPI packages with Conda packages:
from metaflow import FlowSpec, step, conda, pypi

class HybridFlow(FlowSpec):
    
    @conda(libraries={"numpy": "1.21.5"})
    @pypi(packages={"requests": "2.28.0"})
    @step
    def start(self):
        import numpy as np
        import requests
        # numpy from Conda, requests from PyPI
        self.next(self.end)
    
    @step
    def end(self):
        pass

Development Dependencies

Use different versions for testing:
class TestFlow(FlowSpec):
    
    @pypi(packages={"mypackage": "1.0.0"})
    @step
    def test_v1(self):
        import mypackage
        assert mypackage.__version__ == "1.0.0"
        self.next(self.test_v2)
    
    @pypi(packages={"mypackage": "2.0.0"})
    @step
    def test_v2(self):
        import mypackage
        assert mypackage.__version__ == "2.0.0"
        self.next(self.end)
    
    @step
    def end(self):
        print("All version tests passed")

Interaction with Other Decorators

With @conda

Mix Conda and PyPI packages for maximum flexibility:
from metaflow import FlowSpec, step, conda, pypi

class MLFlow(FlowSpec):
    
    @conda(libraries={"pytorch": "2.0.0"})
    @pypi(packages={"transformers": "4.28.0"})
    @step
    def train(self):
        import torch
        from transformers import AutoModel
        # PyTorch from Conda, Transformers from PyPI
        self.next(self.end)
    
    @step
    def end(self):
        pass

With @pypi_base

Extend flow-level PyPI dependencies:
@pypi_base(packages={"requests": "2.28.0"})
class ExtendedFlow(FlowSpec):
    
    @step
    def use_base_only(self):
        import requests
        # Only requests available
        self.next(self.use_extended)
    
    @pypi(packages={"beautifulsoup4": "4.11.1"})
    @step
    def use_extended(self):
        import requests
        from bs4 import BeautifulSoup
        # requests (base) + beautifulsoup4 (step)
        self.next(self.end)
    
    @step
    def end(self):
        pass

With @named_env

Extend a named environment with PyPI packages:
@named_env(name="team/base-env:v1")
@pypi(packages={"fastapi": "0.95.0"})
@step
def api_endpoint(self):
    # Uses base-env plus fastapi
    from fastapi import FastAPI
    self.next(self.end)

Pure PyPI vs Mixed Mode

Pure PyPI Mode

Faster resolution, uses pip tool:
@pypi_base(packages={"pandas": "1.5.0"})
class PurePypiFlow(FlowSpec):
    @pypi(packages={"numpy": "1.21.5"})
    @step
    def start(self):
        # All packages from PyPI
        # Fast resolution with pip
        self.next(self.end)
    
    @step
    def end(self):
        pass

Mixed Mode

More flexible, uses poetry tool:
@conda_base(libraries={"pytorch": "2.0.0"})
@pypi_base(packages={"transformers": "4.28.0"})
class MixedFlow(FlowSpec):
    @step
    def start(self):
        # Conda + PyPI packages
        # Slower resolution with poetry
        self.next(self.end)
    
    @step
    def end(self):
        pass

Merge Behavior

When combining @pypi with @pypi_base:
  1. Step-level overrides flow-level: Package versions in @pypi override those in @pypi_base
  2. Python version is overridden: Step-level python parameter overrides flow-level
  3. Indices are merged: Extra indices from both decorators are combined
  4. Disabled takes precedence: Setting disabled=True at step level disables the environment
@pypi_base(
    packages={"requests": "2.27.0", "boto3": "1.26.0"},
    extra_indices=["https://pypi.internal.com/simple"],
    python="3.8"
)
class MergeExample(FlowSpec):
    
    @pypi(
        packages={"requests": "2.28.0"},  # Overrides requests
        extra_indices=["https://artifacts.company.com/simple"],
        python="3.9"  # Overrides Python
    )
    @step
    def start(self):
        import requests
        import boto3
        # requests: 2.28.0 (overridden)
        # boto3: 1.26.0 (inherited)
        # python: 3.9 (overridden)
        # searches both package indices
        self.next(self.end)
    
    @step
    def end(self):
        pass

Requirements

  • Must use --environment=conda when running the flow
  • Requires Conda/Mamba/Micromamba installed (even for pure PyPI mode)
  • Remote execution (AWS Batch, Kubernetes) automatically handles environment creation

Notes

  • Pure PyPI mode (no Conda packages) uses pip for faster resolution
  • Mixed mode (Conda + PyPI) uses poetry for dependency resolution
  • Version constraints use PEP 440 syntax
  • Environments are hermetic and reproducible
  • All transitive dependencies are locked for reproducibility

Build docs developers (and LLMs) love