Skip to main content
The Metaflow Netflix Extensions allow you to create pure PyPI environments without Conda packages, offering a lightweight and familiar way to manage Python dependencies.

Overview

Pure PyPI environments use only packages from the Python Package Index (PyPI), resolved using the standard pip tool. This mode is ideal when:
  • You only need Python packages
  • You want faster resolution times compared to mixed environments
  • You’re familiar with requirements.txt format
  • Your packages are available as wheels on PyPI

Using the @pypi Decorator

The @pypi decorator provides a straightforward way to specify PyPI-only dependencies:
simplecondaflow-pypi.py
from metaflow import FlowSpec, step, pypi

class CondaTestFlowPypi(FlowSpec):
    
    @pypi(packages={"pandas": "1.4.0"}, python=">=3.8,<3.9")
    @step
    def start(self):
        import pandas as pd
        assert pd.__version__ == "1.4.0"
        print("I am in start and Pandas version is %s" % pd.__version__)
        self.next(self.end)
        
    @pypi(packages={"pandas": "1.5.0"}, python=">=3.8,<3.9")
    @step
    def end(self):
        import pandas as pd
        assert pd.__version__ == "1.5.0"
        print("I am in end and Pandas version is %s" % pd.__version__)

if __name__ == "__main__":
    CondaTestFlowPypi()
Run with:
python simplecondaflow-pypi.py --environment=conda run

Decorator Parameters

The @pypi and @pypi_base decorators accept:
  • packages: Dictionary of package names to version constraints
    • Simple version: {"pandas": "1.4.0"}
    • Complex constraints: {"numpy": ">=1.20,<2.0"}
    • Floating version: {"requests": ""} (latest compatible version)
  • python: Python version constraint
    • Simple: "3.8.17"
    • Range: ">=3.8,<3.9"
  • extra_indices: List of additional PyPI indices
  • disabled: Boolean to disable the environment for a step

Using requirements.txt

For more complex dependency specifications, use a requirements.txt file:
req_numpy.txt
numpy==1.21.5
Resolve the environment:
metaflow environment resolve --python ">=3.8,<3.9" -r req_numpy.txt

Supported Options

Metaflow supports a subset of requirements.txt syntax:
Standard package syntax:
# Exact version
pandas==1.4.0

# Version constraints
numpy>=1.20,<2.0

# Package with extras
apache-airflow[aiobotocore]

# Git repository
clip @ git+https://github.com/openai/CLIP.git@d50d76daa670286dd6cacf3bcd80b5e4823fc8e1

# Local package
foo @ file:///tmp/build_foo_pkg
The following are not supported:
  • -i or --index-url (use METAFLOW_CONDA_DEFAULT_PYPI_SOURCE config instead)
  • Environment markers (e.g., package==1.0; python_version < '3.8')
  • Constraints files (-c option)
  • Editable packages (-e option)

Non-Wheel Packages

Building from Source

When a package isn’t available as a wheel, Metaflow will automatically build it:
requirements-build.txt
--conda-pkg git-lfs
# Needs LFS to build
transnetv2 @ git+https://github.com/soCzech/TransNetV2.git#main

# Source-only distribution
outlier-detector==0.0.3
Building packages requires that you’re resolving on the same architecture as the target. Cross-platform builds are not supported due to potential compilation issues.

Architecture Restrictions

Non-wheel packages have important limitations:
1

Same Architecture Requirement

You cannot resolve on your Mac laptop for Batch execution if the package requires building. The architectures must match.
2

Supported Sources

Source builds work for:
  • Git repositories
  • Local directories
  • Source tarballs (.tar.gz, .zip)
3

Caching Built Wheels

Once built, Metaflow caches the wheel package to S3/Azure/GS for reuse, avoiding repeated builds.

Combining with Non-Python Packages

Use the --conda-pkg extension to add system tools:
requirements-ffmpeg.txt
--conda-pkg ffmpeg
ffmpeg-python
This creates a pure PyPI environment with Python packages resolved by pip, while still including the ffmpeg binary from Conda.
This approach is useful when you want the speed and simplicity of pip resolution but need system utilities like git-lfs, ffmpeg, or other non-Python tools.

Performance Tips

Use Wheels When Possible

Packages available as wheels resolve faster and work cross-platform.

Pin Versions

Exact versions (==) resolve faster than ranges and ensure reproducibility.

Minimize Extras

Only include package extras you actually need (e.g., boto3[s3] not boto3[all]).

Cache Built Wheels

Let Metaflow cache built wheels in S3/Azure/GS to avoid rebuilding.

Example: Complete Requirements File

Here’s a comprehensive example showing various features:
requirements-complete.txt
# Standard packages
pandas==1.5.0
numpy>=1.20,<2.0
scikit-learn==1.2.0

# Package with extras
apache-airflow[aiobotocore]==2.5.0

# Additional index for private packages
--extra-index-url https://my-company.com/pypi/simple

# Pre-release versions allowed
--pre

# System tool from Conda
--conda-pkg git-lfs>=2.0

# Git repository (same architecture only)
# custom-ml-lib @ git+https://github.com/company/[email protected]
Uncomment the Git repository line only if resolving on the same architecture as the target execution environment.

Build docs developers (and LLMs) love