Skip to main content
The @pypi_base decorator allows you to specify PyPI packages that apply to all steps in your Metaflow flow by default. Individual steps can override or augment these dependencies using @pypi.

Overview

Use @pypi_base to declare flow-wide PyPI dependencies. This decorator sets default packages and Python versions that all steps inherit, reducing repetition in your flow definition.
from metaflow import FlowSpec, step, pypi_base, pypi

@pypi_base(packages={"requests": "2.28.0", "boto3": "1.26.0"}, python=">=3.8,<3.9")
class MyFlow(FlowSpec):
    
    @step
    def start(self):
        import requests
        import boto3
        # Both packages available
        self.next(self.process)
    
    @pypi(packages={"beautifulsoup4": "4.11.1"})
    @step
    def process(self):
        import requests
        import boto3
        from bs4 import BeautifulSoup
        # Inherits requests and boto3, adds beautifulsoup4
        self.next(self.end)
    
    @step
    def end(self):
        # Also has requests and boto3
        pass

if __name__ == "__main__":
    MyFlow()

When to Use

  • Common dependencies: When most/all steps need the same PyPI packages
  • Reduce repetition: Avoid specifying the same packages on every step
  • Default Python version: Set a consistent Python version across the flow
  • Base environment: Establish a foundation that steps can extend
  • Pure PyPI flows: For flows that don’t need Conda packages

Parameters

packages
Dict[str, str]
default:"{}"
Dictionary of PyPI packages to include in all steps by default. Keys are package names, values are version constraints.Version constraints use PEP 440 syntax:
  • Simple pinned versions: "2.28.0"
  • Range constraints: ">=2.0,<3.0"
  • Compatible release: "~=2.28.0"
  • Minimum version: ">=2.0"
  • Empty string for latest: ""
@pypi_base(packages={
    "requests": "2.28.0",
    "django": ">=3.2,<4.0",
    "celery": "~=5.2.0",
    "boto3": ""
})
extra_indices
List[str]
default:"[]"
Additional PyPI package indices to search. By default, only pypi.org is used.
@pypi_base(
    packages={"internal-lib": "1.0.0"},
    extra_indices=["https://pypi.company.com/simple"]
)
python
str
default:"None"
Python version constraint for all steps by default. If not specified, the current Python version is used.Can be a specific version or a constraint expression:
  • "3.8.12" - Exact version
  • ">=3.8,<3.9" - Range constraint
  • "<3.11" - Upper bound only
@pypi_base(packages={"django": "4.0"}, python=">=3.8,<3.9")
disabled
bool
default:"False"
If set to True, disables the managed environment for all steps by default. Individual steps can still enable it using @pypi.
@pypi_base(disabled=True)
class MyFlow(FlowSpec):
    @step
    def use_system(self):
        # Runs in system environment
        pass
    
    @pypi(packages={"requests": "2.28.0"})
    @step
    def use_managed(self):
        # Explicitly enables managed environment
        pass

Deprecated Parameters

sources
List[str]
default:"[]"
DEPRECATED - Use extra_indices instead.Additional PyPI package sources. The extra_indices parameter supersedes this.
name
str
default:"None"
DEPRECATED - Use @named_env_base(name=...) instead.Reference to a named environment. Superseded by @named_env_base decorator.
pathspec
str
default:"None"
DEPRECATED - Use @named_env_base(pathspec=...) instead.Reference to a pathspec of an existing step. Use @named_env_base decorator instead.
fetch_at_exec
bool
default:"False"
DEPRECATED - Use @named_env_base(fetch_at_exec=...) instead.Fetch environment at execution time. Use @named_env_base decorator instead.

Usage Examples

Basic Flow-Wide Dependencies

Set common PyPI packages for all steps:
from metaflow import FlowSpec, step, pypi_base

@pypi_base(packages={
    "requests": "2.28.0",
    "boto3": "1.26.0",
    "pandas": "1.5.0"
}, python=">=3.8,<3.9")
class APIDataPipeline(FlowSpec):
    
    @step
    def start(self):
        import requests
        import boto3
        import pandas as pd
        # All base packages available
        self.next(self.process)
    
    @step
    def process(self):
        import pandas as pd
        # Base packages still available
        self.next(self.end)
    
    @step
    def end(self):
        pass

Extending Base with Step-Specific Packages

Add additional packages to specific steps:
@pypi_base(packages={
    "requests": "2.28.0",
    "pandas": "1.5.0"
})
class WebScraperFlow(FlowSpec):
    
    @step
    def start(self):
        import requests
        # Only base packages
        self.next(self.scrape, self.analyze)
    
    @pypi(packages={"beautifulsoup4": "4.11.1"})
    @step
    def scrape(self):
        import requests
        from bs4 import BeautifulSoup
        # Base + beautifulsoup4
        self.next(self.join)
    
    @pypi(packages={"plotly": "5.14.0"})
    @step
    def analyze(self):
        import pandas as pd
        import plotly.express as px
        # Base + plotly
        self.next(self.join)
    
    @step
    def join(self, inputs):
        self.next(self.end)
    
    @step
    def end(self):
        pass

Overriding Base Versions

Step-level decorators override base versions:
@pypi_base(
    packages={"requests": "2.27.0", "boto3": "1.24.0"},
    python="3.8"
)
class VersionOverride(FlowSpec):
    
    @step
    def use_base(self):
        import requests
        # requests 2.27.0, boto3 1.24.0, python 3.8
        self.next(self.use_override)
    
    @pypi(
        packages={"requests": "2.28.0"},  # Overrides requests
        python="3.9"                       # Overrides Python
    )
    @step
    def use_override(self):
        import requests
        import boto3
        # requests 2.28.0 (overridden)
        # boto3 1.24.0 (inherited)
        # python 3.9 (overridden)
        self.next(self.end)
    
    @step
    def end(self):
        pass

Combining with Conda Base

Mix Conda and PyPI base dependencies:
from metaflow import FlowSpec, step, conda_base, pypi_base

@conda_base(libraries={"numpy": "1.21.5"})
@pypi_base(packages={"requests": "2.28.0"})
class HybridFlow(FlowSpec):
    
    @step
    def start(self):
        import numpy as np
        import requests
        # Both Conda and PyPI base packages
        self.next(self.end)
    
    @step
    def end(self):
        pass

Custom Package Indices

Use private PyPI indices:
@pypi_base(
    packages={
        "internal-utils": "2.0.0",
        "requests": "2.28.0"
    },
    extra_indices=[
        "https://pypi.company.com/simple",
        "https://artifacts.internal.net/pypi/simple"
    ]
)
class InternalToolsFlow(FlowSpec):
    
    @step
    def start(self):
        import internal_utils
        import requests
        # Both internal and public packages
        self.next(self.end)
    
    @step
    def end(self):
        pass

Pure PyPI Flow

Create a flow using only PyPI packages (faster resolution):
@pypi_base(packages={
    "django": ">=3.2,<4.0",
    "celery": "~=5.2.0",
    "redis": ">=4.0",
    "psycopg2-binary": "2.9.0"
}, python="3.9")
class WebAppFlow(FlowSpec):
    
    @step
    def start(self):
        import django
        import celery
        # Pure PyPI environment, fast resolution
        self.next(self.end)
    
    @pypi(packages={"gunicorn": "20.1.0"})
    @step
    def deploy(self):
        # Add deployment-specific packages
        self.next(self.end)
    
    @step
    def end(self):
        pass

Disabling by Default

Disable managed environment for all steps except those that need it:
@pypi_base(disabled=True)
class SelectiveFlow(FlowSpec):
    
    @step
    def use_system(self):
        # Runs in system Python
        import subprocess
        subprocess.run(["some-system-tool"])
        self.next(self.use_managed)
    
    @pypi(packages={"requests": "2.28.0"})
    @step
    def use_managed(self):
        # Explicitly enables managed environment
        import requests
        self.next(self.end)
    
    @step
    def end(self):
        # Back to system Python
        pass

Merge Behavior

When combining @pypi_base with step-level decorators:

Package Merging

@pypi_base(packages={"requests": "2.27.0", "boto3": "1.26.0"})
class MergeExample(FlowSpec):
    
    @pypi(packages={"requests": "2.28.0", "django": "4.0"})
    @step
    def start(self):
        # Result:
        # - requests: 2.28.0 (overridden by step)
        # - boto3: 1.26.0 (inherited from base)
        # - django: 4.0 (added by step)
        pass

Python Version Override

@pypi_base(python="3.8")
class PythonOverride(FlowSpec):
    
    @step
    def use_base_python(self):
        # Uses Python 3.8
        pass
    
    @pypi(python="3.10")
    @step
    def use_different_python(self):
        # Uses Python 3.10 (overrides base)
        pass

Index Merging

Extra indices from both decorators are combined:
@pypi_base(
    extra_indices=["https://pypi.internal.com/simple"]
)
class IndexMerge(FlowSpec):
    
    @pypi(
        packages={"special-lib": "1.0.0"},
        extra_indices=["https://artifacts.company.com/simple"]
    )
    @step
    def start(self):
        # Searches both internal and company artifact indices
        pass

Interaction with Other Decorators

With @conda_base

Combine for hybrid Conda/PyPI environments:
@conda_base(libraries={"numpy": "1.21.5", "pytorch": "2.0.0"})
@pypi_base(packages={"transformers": "4.28.0", "datasets": "2.11.0"})
class MLFlow(FlowSpec):
    @step
    def start(self):
        import numpy as np
        import torch
        from transformers import AutoModel
        # Conda packages: numpy, pytorch
        # PyPI packages: transformers, datasets
        self.next(self.end)
    
    @step
    def end(self):
        pass

With @named_env_base

Cannot combine @pypi_base with @named_env_base at flow level:
# This will cause an error:
@pypi_base(packages={"requests": "2.28.0"})
@named_env_base(name="team/base-env")  # ERROR: Conflicting decorators
class BadFlow(FlowSpec):
    pass
Instead, extend at step level:
@named_env_base(name="team/base-env")
class GoodFlow(FlowSpec):
    @pypi(packages={"requests": "2.28.0"})
    @step
    def start(self):
        # Extends named environment with requests
        pass

Step-Level Overrides

Fine-grained control with step decorators:
@pypi_base(packages={"requests": "2.28.0"})
class OverrideFlow(FlowSpec):
    
    @step
    def default_step(self):
        # Uses base packages
        pass
    
    @pypi(packages={"beautifulsoup4": "4.11.1"})
    @step
    def extended_step(self):
        # Base + beautifulsoup4
        pass
    
    @pypi(packages={"requests": "2.27.0"}, python="3.7")
    @step
    def overridden_step(self):
        # Different requests version and Python
        pass
    
    @pypi(disabled=True)
    @step
    def system_step(self):
        # Disables managed environment
        pass

Pure PyPI vs Mixed Mode

Pure PyPI (Faster)

Only @pypi_base, no Conda:
@pypi_base(packages={"pandas": "1.5.0", "numpy": "1.21.5"})
class PurePyPIFlow(FlowSpec):
    # Fast resolution with pip
    # All packages from PyPI
    pass

Mixed Mode (Flexible)

Combines @conda_base and @pypi_base:
@conda_base(libraries={"cudatoolkit": "11.7"})
@pypi_base(packages={"tensorflow": "2.12.0"})
class MixedFlow(FlowSpec):
    # Slower resolution with poetry
    # More flexibility with non-Python packages
    pass

Requirements

  • Must use --environment=conda when running the flow
  • Requires Conda/Mamba/Micromamba installed (even for pure PyPI)
  • Applied at the class level (before the flow class definition)

Notes

  • Pure PyPI mode (no Conda packages) uses pip for faster resolution
  • Mixed mode (with Conda) uses poetry for dependency resolution
  • Version constraints use PEP 440 syntax
  • Step-level @pypi always overrides conflicting @pypi_base settings
  • Environments are resolved once and cached for reuse
  • Compatible with all Metaflow execution environments

Build docs developers (and LLMs) love