Skip to main content
The Metaflow Netflix Extensions provide cutting-edge functionality that enhances the core Metaflow package with improved Conda support and debugging capabilities.

Prerequisites

  • Metaflow: Version 2.16.0 or later
  • Python: 3.7.2 or higher (tested on Python 3.7-3.13)
  • Operating System: macOS (Intel/ARM) or Linux

Installation

1

Install via pip

Install the extension alongside your existing Metaflow installation:
pip install metaflow-netflixext
The extension automatically integrates with Metaflow once installed.
2

Verify installation

Check that the extension is properly loaded:
python -c "from metaflow import FlowSpec; print('Extension loaded successfully')"

Configuration

Configuration options are defined in your Metaflow configuration file. All configuration variables should be prefixed with METAFLOW_ when setting them in your profile or environment.

Core Configuration Options

For Metaflow versions prior to v2.10, configuration values must be set directly in the configuration file due to limitations in the OSS implementation of decorators. This limitation is removed in v2.10+.

Cloud Storage Root

Specify where cached packages and environments are stored:
# For S3
CONDA_S3ROOT = "s3://my-bucket/conda_env"

# For Azure Blob Storage
CONDA_AZUREROOT = "azure://my-container/conda_env"

# For Google Cloud Storage
CONDA_GSROOT = "gs://my-bucket/conda_env"
Do not point these to the same prefix as the standard Metaflow Conda implementation to avoid conflicts.

Dependency Resolvers

Configure which tools to use for resolving different types of environments:
# For Conda packages (default: "mamba")
# Options: "mamba", "conda", "micromamba"
CONDA_DEPENDENCY_RESOLVER = "mamba"

# For pure PyPI environments (default: "uv")
# Options: "pip", "uv", "none"
CONDA_PYPI_DEPENDENCY_RESOLVER = "uv"

# For mixed Conda/PyPI environments (default: "conda-lock")
# Options: "conda-lock", "none"
CONDA_MIXED_DEPENDENCY_RESOLVER = "conda-lock"
  • mamba is recommended for faster resolution times
  • Set resolvers to "none" to disable specific functionality
  • uv provides faster PyPI resolution than pip

Package Cache Directories

# Directory for cached packages (default: "packages")
CONDA_PACKAGES_DIRNAME = "packages"

# Directory for cached environments (default: "envs")
CONDA_ENVS_DIRNAME = "envs"

Remote Execution

# Directory containing remote installer binaries
CONDA_REMOTE_INSTALLER_DIRNAME = "conda-remote"

# Architecture-specific installer (use {arch} placeholder)
CONDA_REMOTE_INSTALLER = "micromamba-{arch}"
If CONDA_REMOTE_INSTALLER_DIRNAME is not set, the latest version of micromamba will be downloaded automatically on remote environments.

Performance Options

Package Format Preference

# Prefer .conda format for speed gains (default: "none")
# Options: ".tar.bz2", ".conda", "none"
CONDA_PREFERRED_FORMAT = ".conda"
Using .conda format provides significant performance improvements. Requires either micromamba or conda-package-handling to be installed.

Remote Environment Lookup

# Control remote environment caching behavior (default: ":none:")
# Options: ":none:", ":username:", ":any:", or comma-separated usernames
CONDA_USE_REMOTE_LATEST = ":username:"
  • :none: - Always resolve new environments locally
  • :username: - Check for cached environments by the current user
  • :any: - Use any cached environment that matches
  • user1,user2 - Check for environments cached by specific users

PyPI Configuration

# Default PyPI mirror (optional)
CONDA_DEFAULT_PYPI_SOURCE = "https://pypi.org/simple"

# Lock timeout in seconds (default: 3600)
CONDA_LOCK_TIMEOUT = 3600

Local Environment Configuration

# Directory for local Conda distributions
CONDA_LOCAL_DIST_DIRNAME = "conda-local"

# Local distribution tarball name
CONDA_LOCAL_DIST = "conda-{arch}.tgz"

# Installation path for local distributions
CONDA_LOCAL_PATH = "/usr/local/libexec/metaflow-conda"
If CONDA_LOCAL_DIST_DIRNAME is not set, Metaflow will use the Conda executable from your PATH.

System Dependencies

# Default system packages by architecture
CONDA_SYS_DEFAULT_PACKAGES = {
    "linux-64": {"__glibc": "2.27"},
}

# GPU-specific packages (when GPU resources are requested)
CONDA_SYS_DEFAULT_GPU_PACKAGES = {
    "__cuda": "11.8=0"
}

# All supported architectures
CONDA_ALL_ARCHS = ["linux-64", "osx-64", "osx-arm64"]

Platform-Specific Setup

Azure Configuration

1

Create blob container

Manually create the blob container specified in CONDA_AZUREROOT:
az storage container create --name my-container --account-name my-storage-account
2

Grant permissions

Assign the Storage Blob Data Contributor role to service principals or user accounts:
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee <principal-id> \
  --scope /subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<account-name>
See the Azure documentation for more details.

S3 Configuration

Ensure your AWS credentials have read/write permissions to the S3 bucket specified in CONDA_S3ROOT:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-bucket/conda_env/*",
        "arn:aws:s3:::my-bucket/conda_env"
      ]
    }
  ]
}

Google Cloud Storage Configuration

Ensure your GCS credentials have Storage Object Admin permissions:
gcloud storage buckets add-iam-policy-binding gs://my-bucket \
  --member="user:[email protected]" \
  --role="roles/storage.objectAdmin"

Required Packages

Your local Conda environment needs these packages based on the features you want to use:

Base Requirements

conda install conda
conda install mamba>=1.4.0 micromamba>=1.4.0
micromamba is strongly recommended and required for handling mixed .conda and .tar.bz2 packages.

Pure PyPI Support

conda install pip>=23.0
# Or for faster resolution
pip install uv

Mixed Package Support

conda install conda-lock>=2.1.0

Package Format Conversion

For converting between .tar.bz2 and .conda formats:
conda install conda-package-handling>=1.9.0
# Or use micromamba (preferred)

Troubleshooting

Enable Debug Logging

Set the debug flag to get detailed output:
export METAFLOW_DEBUG_CONDA=1
python my_flow.py run

File Locking Issues

If Metaflow appears stuck waiting for a lock file:
  1. Check for stale lock files in your Conda directories
  2. Adjust the timeout: export METAFLOW_CONDA_LOCK_TIMEOUT=7200
  3. Manually remove lock files if safe to do so

Mixed Package Format Issues

Conda/mamba cannot create environments with mixed .conda and .tar.bz2 packages in offline mode. Install micromamba to resolve this issue.

Uninstallation

To revert to the standard Metaflow Conda implementation:
pip uninstall metaflow-netflixext
It’s safe to switch between implementations. Ensure they use different caching prefixes to avoid conflicts.

Next Steps

Quick Start

Create your first flow with enhanced Conda support

Configuration Reference

View all configuration options

Build docs developers (and LLMs) love