Mixed environments allow you to combine packages from both Conda and PyPI repositories, giving you access to the full ecosystem of Python packages while leveraging Conda’s ability to manage non-Python dependencies.
Overview
Mixed mode uses conda-lock (via Poetry) to resolve dependencies from both ecosystems. This is useful when:
You need packages only available in one ecosystem
You want Conda’s superior handling of complex dependencies (e.g., TensorFlow with CUDA)
You need non-Python system libraries
You want reproducible environments across both package managers
Mixed mode is slower than pure PyPI or pure Conda resolution due to the complexity of cross-ecosystem dependency resolution.
Using Decorators
Combine @conda and @pypi decorators on the same step:
from metaflow import FlowSpec, step, conda, pypi
class MixedEnvFlow ( FlowSpec ):
@conda ( libraries = { "numpy" : "1.21.5" }, python = ">=3.8,<3.9" )
@pypi ( packages = { "tensorflow" : "2.7.4" })
@step
def start ( self ):
import numpy as np
import tensorflow as tf
print ( f "NumPy { np. __version__ } from Conda" )
print ( f "TensorFlow { tf. __version__ } from PyPI" )
self .next( self .end)
@step
def end ( self ):
print ( "Done" )
if __name__ == "__main__" :
MixedEnvFlow()
Flow-Level Base Environments
Use @conda_base and @pypi_base for dependencies shared across all steps:
@conda_base ( libraries = { "numpy" : "1.21.5" }, python = ">=3.8,<3.9" )
@pypi_base ( packages = { "requests" : "2.28.0" })
class MixedEnvFlow ( FlowSpec ):
@pypi ( packages = { "pandas" : "1.5.0" })
@step
def start ( self ):
# Has numpy (from conda_base), requests (from pypi_base),
# and pandas (from step decorator)
import numpy, requests, pandas
self .next( self .end)
@conda ( libraries = { "scipy" : "1.9.0" })
@step
def end ( self ):
# Has numpy, scipy, and requests
# Step-level overrides flow-level
import numpy, scipy, requests
if __name__ == "__main__" :
MixedEnvFlow()
Step-level decorators override flow-level decorators. If you specify python="3.9" at the step level, it overrides the flow-level Python version.
Using environment.yml
The environment.yml format provides a more structured way to specify mixed dependencies:
channels :
- conda-forge
- defaults
dependencies :
- pandas>=1.0.0
- numpy=1.21.5
- python>=3.8,<3.9
- pip :
- tensorflow==2.7.4
- apache-airflow[aiobotocore]
Resolve with:
metaflow environment resolve -f env_mixed.yml
Syntax Details
Channels
Conda Dependencies
PyPI Dependencies
PyPI Indices
Specify Conda channels to search: channels :
- conda-forge
- defaults
- my-company-channel
Channels are searched in order. Use channel priority: channels :
- conda-forge # Highest priority
- defaults
List Conda packages with version constraints: dependencies :
- python=3.8
- numpy=1.21.5
- pandas>=1.0.0,<2.0.0
- scikit-learn # Any version
- conda-forge::pytorch # From specific channel
Version operators:
= or == - Exact version
>=, <=, >, < - Comparisons
>=1.0,<2.0 - Combined constraints
Nest PyPI packages under pip:: dependencies :
- python=3.8
- numpy=1.21.5
- pip :
- tensorflow==2.7.4
- torch==1.12.0
- apache-airflow[aiobotocore]
In mixed mode, PyPI packages cannot be from Git repositories or local directories. Only wheels and source tarballs from PyPI indices are supported.
Add custom PyPI indices: channels :
- conda-forge
pypi-indices :
- https://my-company.com/pypi/simple
- https://my-private-pypi.org/simple
dependencies :
- python=3.8
- pip :
- my-private-package==1.0.0
Using conda-lock
Under the hood, mixed environments use conda-lock with Poetry. The resolution process:
Generate TOML Configuration
Metaflow converts your requirements into a pyproject.toml file that conda-lock understands.
Resolve with conda-lock
conda-lock calls both Conda and Poetry to resolve the full dependency tree, ensuring compatibility between both ecosystems.
Generate Explicit Specification
The result is an explicit list of all packages with exact versions and download URLs.
Build Non-Wheel Packages
Any PyPI source packages are built into wheels and cached for reuse.
Requirements
To use mixed environments, your environment needs:
conda install conda-loc k > = 2.1.0
conda-lock is not required on remote execution nodes—only on the machine where you resolve environments.
Restrictions and Limitations
Mixed mode has several important restrictions due to conda-lock limitations:
Package Source Restrictions
Not Supported
Git repositories
Local directories
Editable packages
Non-wheel builds
Supported
PyPI wheels
Source tarballs from PyPI
Packages with extras
All Conda packages
Example of Unsupported Packages
dependencies :
- python=3.8
- numpy=1.21.5
- pip :
# ❌ This will FAIL - Git repos not supported in mixed mode
- my-package @ git+https://github.com/user/repo.git@main
# ❌ This will FAIL - Local directories not supported
- local-pkg @ file:///path/to/local/package
# ✅ This works - regular PyPI package
- tensorflow==2.7.4
If you need Git repositories or local packages, use pure PyPI mode instead (see PyPI Packages ).
Channel Priority
When using channels with :: notation or extra channels, Metaflow sets flexible channel priority:
channels :
- conda-forge
- pytorch
- defaults
dependencies :
- python=3.8
- numpy=1.21.5
- pytorch::pytorch=1.12.0 # Forces pytorch channel
- cudatoolkit>=11.0
The :: notation (e.g., pytorch::pytorch) forces a package to come from a specific channel, overriding channel priority.
Virtual Packages
Metaflow automatically includes system virtual packages for Linux:
subdirs :
linux-64 :
packages :
__glibc : "2.27"
__cuda : "11.2"
You can override these using --sys-pkg in requirements.txt or the sys: section in environment.yml.
Resolution Time
When to Use Mixed
Optimization Tips
Mixed environments are slower to resolve:
Pure PyPI: ~10-30 seconds
Pure Conda: ~30-60 seconds
Mixed: ~60-120 seconds
This is due to conda-lock coordinating between two ecosystems. Use mixed mode when you need it:
Packages only available in one ecosystem
Complex system dependencies (CUDA, MKL, etc.)
Specific compiler-optimized versions
Otherwise, prefer pure PyPI or pure Conda for speed.
Pin versions exactly when possible
Minimize the number of packages
Use named environments to cache resolutions
Prefer conda-forge channel for consistency
Complete Example
Here’s a real-world mixed environment for ML workloads:
channels :
- conda-forge
- defaults
pypi-indices :
- https://download.pytorch.org/whl/cu116
dependencies :
- python=3.9
# Scientific computing from Conda (optimized builds)
- numpy=1.23.0
- scipy=1.9.0
- pandas=1.4.0
# CUDA from Conda
- cudatoolkit=11.6
# Deep learning from PyPI
- pip :
- torch==1.12.0+cu116
- torchvision==0.13.0+cu116
- transformers==4.20.0
- wandb==0.13.0
Resolve and create an alias:
metaflow environment resolve \
-f env_ml_mixed.yml \
--alias mlp/ml-team/torch-cuda:v1
Then use in your flow:
from metaflow import FlowSpec, step, named_env
class MLFlow ( FlowSpec ):
@named_env ( name = "mlp/ml-team/torch-cuda:v1" )
@step
def train ( self ):
import torch
import transformers
print ( f "CUDA available: { torch.cuda.is_available() } " )
self .next( self .end)
@step
def end ( self ):
pass