The @pypi_base decorator allows you to specify PyPI packages that apply to all steps in your Metaflow flow by default. Individual steps can override or augment these dependencies using @pypi.
Overview
Use @pypi_base to declare flow-wide PyPI dependencies. This decorator sets default packages and Python versions that all steps inherit, reducing repetition in your flow definition.
from metaflow import FlowSpec, step, pypi_base, pypi
@pypi_base(packages={"requests": "2.28.0", "boto3": "1.26.0"}, python=">=3.8,<3.9")
class MyFlow(FlowSpec):
@step
def start(self):
import requests
import boto3
# Both packages available
self.next(self.process)
@pypi(packages={"beautifulsoup4": "4.11.1"})
@step
def process(self):
import requests
import boto3
from bs4 import BeautifulSoup
# Inherits requests and boto3, adds beautifulsoup4
self.next(self.end)
@step
def end(self):
# Also has requests and boto3
pass
if __name__ == "__main__":
MyFlow()
When to Use
- Common dependencies: When most/all steps need the same PyPI packages
- Reduce repetition: Avoid specifying the same packages on every step
- Default Python version: Set a consistent Python version across the flow
- Base environment: Establish a foundation that steps can extend
- Pure PyPI flows: For flows that don’t need Conda packages
Parameters
packages
Dict[str, str]
default:"{}"
Dictionary of PyPI packages to include in all steps by default. Keys are package names, values are version constraints.Version constraints use PEP 440 syntax:
- Simple pinned versions:
"2.28.0"
- Range constraints:
">=2.0,<3.0"
- Compatible release:
"~=2.28.0"
- Minimum version:
">=2.0"
- Empty string for latest:
""
@pypi_base(packages={
"requests": "2.28.0",
"django": ">=3.2,<4.0",
"celery": "~=5.2.0",
"boto3": ""
})
Additional PyPI package indices to search. By default, only pypi.org is used.@pypi_base(
packages={"internal-lib": "1.0.0"},
extra_indices=["https://pypi.company.com/simple"]
)
Python version constraint for all steps by default. If not specified, the current Python version is used.Can be a specific version or a constraint expression:
"3.8.12" - Exact version
">=3.8,<3.9" - Range constraint
"<3.11" - Upper bound only
@pypi_base(packages={"django": "4.0"}, python=">=3.8,<3.9")
If set to True, disables the managed environment for all steps by default. Individual steps can still enable it using @pypi.@pypi_base(disabled=True)
class MyFlow(FlowSpec):
@step
def use_system(self):
# Runs in system environment
pass
@pypi(packages={"requests": "2.28.0"})
@step
def use_managed(self):
# Explicitly enables managed environment
pass
Deprecated Parameters
DEPRECATED - Use extra_indices instead.Additional PyPI package sources. The extra_indices parameter supersedes this.
DEPRECATED - Use @named_env_base(name=...) instead.Reference to a named environment. Superseded by @named_env_base decorator.
DEPRECATED - Use @named_env_base(pathspec=...) instead.Reference to a pathspec of an existing step. Use @named_env_base decorator instead.
DEPRECATED - Use @named_env_base(fetch_at_exec=...) instead.Fetch environment at execution time. Use @named_env_base decorator instead.
Usage Examples
Basic Flow-Wide Dependencies
Set common PyPI packages for all steps:
from metaflow import FlowSpec, step, pypi_base
@pypi_base(packages={
"requests": "2.28.0",
"boto3": "1.26.0",
"pandas": "1.5.0"
}, python=">=3.8,<3.9")
class APIDataPipeline(FlowSpec):
@step
def start(self):
import requests
import boto3
import pandas as pd
# All base packages available
self.next(self.process)
@step
def process(self):
import pandas as pd
# Base packages still available
self.next(self.end)
@step
def end(self):
pass
Extending Base with Step-Specific Packages
Add additional packages to specific steps:
@pypi_base(packages={
"requests": "2.28.0",
"pandas": "1.5.0"
})
class WebScraperFlow(FlowSpec):
@step
def start(self):
import requests
# Only base packages
self.next(self.scrape, self.analyze)
@pypi(packages={"beautifulsoup4": "4.11.1"})
@step
def scrape(self):
import requests
from bs4 import BeautifulSoup
# Base + beautifulsoup4
self.next(self.join)
@pypi(packages={"plotly": "5.14.0"})
@step
def analyze(self):
import pandas as pd
import plotly.express as px
# Base + plotly
self.next(self.join)
@step
def join(self, inputs):
self.next(self.end)
@step
def end(self):
pass
Overriding Base Versions
Step-level decorators override base versions:
@pypi_base(
packages={"requests": "2.27.0", "boto3": "1.24.0"},
python="3.8"
)
class VersionOverride(FlowSpec):
@step
def use_base(self):
import requests
# requests 2.27.0, boto3 1.24.0, python 3.8
self.next(self.use_override)
@pypi(
packages={"requests": "2.28.0"}, # Overrides requests
python="3.9" # Overrides Python
)
@step
def use_override(self):
import requests
import boto3
# requests 2.28.0 (overridden)
# boto3 1.24.0 (inherited)
# python 3.9 (overridden)
self.next(self.end)
@step
def end(self):
pass
Combining with Conda Base
Mix Conda and PyPI base dependencies:
from metaflow import FlowSpec, step, conda_base, pypi_base
@conda_base(libraries={"numpy": "1.21.5"})
@pypi_base(packages={"requests": "2.28.0"})
class HybridFlow(FlowSpec):
@step
def start(self):
import numpy as np
import requests
# Both Conda and PyPI base packages
self.next(self.end)
@step
def end(self):
pass
Custom Package Indices
Use private PyPI indices:
@pypi_base(
packages={
"internal-utils": "2.0.0",
"requests": "2.28.0"
},
extra_indices=[
"https://pypi.company.com/simple",
"https://artifacts.internal.net/pypi/simple"
]
)
class InternalToolsFlow(FlowSpec):
@step
def start(self):
import internal_utils
import requests
# Both internal and public packages
self.next(self.end)
@step
def end(self):
pass
Pure PyPI Flow
Create a flow using only PyPI packages (faster resolution):
@pypi_base(packages={
"django": ">=3.2,<4.0",
"celery": "~=5.2.0",
"redis": ">=4.0",
"psycopg2-binary": "2.9.0"
}, python="3.9")
class WebAppFlow(FlowSpec):
@step
def start(self):
import django
import celery
# Pure PyPI environment, fast resolution
self.next(self.end)
@pypi(packages={"gunicorn": "20.1.0"})
@step
def deploy(self):
# Add deployment-specific packages
self.next(self.end)
@step
def end(self):
pass
Disabling by Default
Disable managed environment for all steps except those that need it:
@pypi_base(disabled=True)
class SelectiveFlow(FlowSpec):
@step
def use_system(self):
# Runs in system Python
import subprocess
subprocess.run(["some-system-tool"])
self.next(self.use_managed)
@pypi(packages={"requests": "2.28.0"})
@step
def use_managed(self):
# Explicitly enables managed environment
import requests
self.next(self.end)
@step
def end(self):
# Back to system Python
pass
Merge Behavior
When combining @pypi_base with step-level decorators:
Package Merging
@pypi_base(packages={"requests": "2.27.0", "boto3": "1.26.0"})
class MergeExample(FlowSpec):
@pypi(packages={"requests": "2.28.0", "django": "4.0"})
@step
def start(self):
# Result:
# - requests: 2.28.0 (overridden by step)
# - boto3: 1.26.0 (inherited from base)
# - django: 4.0 (added by step)
pass
Python Version Override
@pypi_base(python="3.8")
class PythonOverride(FlowSpec):
@step
def use_base_python(self):
# Uses Python 3.8
pass
@pypi(python="3.10")
@step
def use_different_python(self):
# Uses Python 3.10 (overrides base)
pass
Index Merging
Extra indices from both decorators are combined:
@pypi_base(
extra_indices=["https://pypi.internal.com/simple"]
)
class IndexMerge(FlowSpec):
@pypi(
packages={"special-lib": "1.0.0"},
extra_indices=["https://artifacts.company.com/simple"]
)
@step
def start(self):
# Searches both internal and company artifact indices
pass
Interaction with Other Decorators
With @conda_base
Combine for hybrid Conda/PyPI environments:
@conda_base(libraries={"numpy": "1.21.5", "pytorch": "2.0.0"})
@pypi_base(packages={"transformers": "4.28.0", "datasets": "2.11.0"})
class MLFlow(FlowSpec):
@step
def start(self):
import numpy as np
import torch
from transformers import AutoModel
# Conda packages: numpy, pytorch
# PyPI packages: transformers, datasets
self.next(self.end)
@step
def end(self):
pass
With @named_env_base
Cannot combine @pypi_base with @named_env_base at flow level:
# This will cause an error:
@pypi_base(packages={"requests": "2.28.0"})
@named_env_base(name="team/base-env") # ERROR: Conflicting decorators
class BadFlow(FlowSpec):
pass
Instead, extend at step level:
@named_env_base(name="team/base-env")
class GoodFlow(FlowSpec):
@pypi(packages={"requests": "2.28.0"})
@step
def start(self):
# Extends named environment with requests
pass
Step-Level Overrides
Fine-grained control with step decorators:
@pypi_base(packages={"requests": "2.28.0"})
class OverrideFlow(FlowSpec):
@step
def default_step(self):
# Uses base packages
pass
@pypi(packages={"beautifulsoup4": "4.11.1"})
@step
def extended_step(self):
# Base + beautifulsoup4
pass
@pypi(packages={"requests": "2.27.0"}, python="3.7")
@step
def overridden_step(self):
# Different requests version and Python
pass
@pypi(disabled=True)
@step
def system_step(self):
# Disables managed environment
pass
Pure PyPI vs Mixed Mode
Pure PyPI (Faster)
Only @pypi_base, no Conda:
@pypi_base(packages={"pandas": "1.5.0", "numpy": "1.21.5"})
class PurePyPIFlow(FlowSpec):
# Fast resolution with pip
# All packages from PyPI
pass
Mixed Mode (Flexible)
Combines @conda_base and @pypi_base:
@conda_base(libraries={"cudatoolkit": "11.7"})
@pypi_base(packages={"tensorflow": "2.12.0"})
class MixedFlow(FlowSpec):
# Slower resolution with poetry
# More flexibility with non-Python packages
pass
Requirements
- Must use
--environment=conda when running the flow
- Requires Conda/Mamba/Micromamba installed (even for pure PyPI)
- Applied at the class level (before the flow class definition)
Notes
- Pure PyPI mode (no Conda packages) uses
pip for faster resolution
- Mixed mode (with Conda) uses
poetry for dependency resolution
- Version constraints use PEP 440 syntax
- Step-level
@pypi always overrides conflicting @pypi_base settings
- Environments are resolved once and cached for reuse
- Compatible with all Metaflow execution environments