Usage Guide - Metaflow Netflix Extensions

Command Syntax

The Debug Extension provides a debug command group with subcommands for debugging tasks:

metaflow debug task [OPTIONS] PATHSPEC

Basic Usage

To debug a specific task:

metaflow debug task FlowName/123/step_name/456 --metaflow-root-dir ~/notebooks/debug_task

Command Options

Required Arguments

`PATHSPEC`

The pathspec identifying which task to debug. Can be a full or partial pathspec:

Full pathspec: FlowName/RunId/StepName/TaskId
Step pathspec: FlowName/RunId/StepName (uses unique task)
Run pathspec: FlowName/RunId (uses end step)
Flow name: FlowName (uses latest successful run and end step)

When using partial pathspecs, the extension will resolve them to a unique task. If multiple tasks exist (e.g., from a foreach), you must specify the full pathspec.

`--metaflow-root-dir`

Required. The root directory where all generated files will be placed:

--metaflow-root-dir ~/notebooks/debug_task

This directory will contain:

The extracted code package
Debug scripts and notebooks
Escape trampolines
Stub generators

Optional Flags

`--inspect`

Enables inspection mode, allowing you to examine the state of a task after it has finished running:

metaflow debug task FlowName/123/step_name/456 \
  --metaflow-root-dir ~/debug \
  --inspect

In inspect mode, the generated notebook will include code to load and examine all artifacts produced by the task.

`--override-env`

Use a different named environment instead of the task’s original environment:

metaflow debug task FlowName/123/step_name/456 \
  --metaflow-root-dir ~/debug \
  --override-env my_custom_env

The named environment must already exist. Create it using metaflow environment create --name my_custom_env.

`--override-env-from-pathspec`

Use the environment from a different task instead of the current task’s environment:

metaflow debug task FlowName/123/step_name/456 \
  --metaflow-root-dir ~/debug \
  --override-env-from-pathspec FlowName/100/other_step/789

This is useful when you want to debug a task with a different set of dependencies or package versions.

Only one of --override-env or --override-env-from-pathspec can be specified at a time.

`--quiet` / `--no-quiet`

Suppress or show informational messages during execution:

metaflow debug --quiet task FlowName/123/step_name/456 --metaflow-root-dir ~/debug

Default: --no-quiet (messages are shown)

Pathspec Resolution

The extension uses intelligent pathspec resolution to make debugging easier:

Resolution Logic

From debug_cmd.py:182-219:

path_components = pathspec.split("/")
if not path_components:
    raise CommandException("Provide either a flow, run, step or task to debug")

if len(path_components) < 2:
    # Flow name only - get latest successful run
    r = Flow(path_components[0]).latest_successful_run
    if r is None:
        raise CommandException(
            "Flow {} can only be specified if there is a successful run in the "
            "current namespace.".format(path_components[0])
        )
    path_components.append(r.id)

if len(path_components) < 3:
    # No step specified - use 'end' step
    path_components.append("end")

if len(path_components) < 4:
    # Enforce single task
    cur_task = None
    for t in r[path_components[2]]:
        if cur_task is not None:
            raise CommandException(
                "Step {} does not refer to a single task -- please specify the "
                "task unambiguously".format("/".join(path_components))
            )
        cur_task = t

Examples

# Explicitly specify all components
metaflow debug task HousePricePredictionFlow/1199/fit_gbrt_for_given_param/150671013 \
  --metaflow-root-dir ~/debug

Generated Files and Artifacts

When you run the debug command, several files are generated in the specified root directory:

Directory Structure

~/notebooks/debug_task/
├── code_package.tar.gz          # Downloaded code package
├── <flow_files>.py              # Extracted flow code
├── _escape_trampolines/         # Python path overrides
│   └── metaflow/
├── debug_stub_generator.py      # Artifact stub generator
├── current_stub_generator.py    # Current object stub generator
├── <pathspec>_debug.py          # Debug script
└── debug.ipynb                  # Debug notebook

Debug Script

The debug script (<pathspec>_debug.py) contains:

Imports from the task’s step code
Stubs for accessing artifacts via self
Helper functions for inspecting data

Example script structure:

# Generated debug script
import sys
import os

# Add escape trampolines to path
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '_escape_trampolines'))

# Import stub generators
from debug_stub_generator import generate_debug_stub
from current_stub_generator import generate_current_stub

# Create 'self' object with artifact access
self = generate_debug_stub('FlowName/123/step_name/456')

# Your step code can now be executed

Debug Notebook

The generated debug.ipynb notebook includes:

Setup cells - Import statements and environment configuration
Stub generation - Code to create the self object
Step code - Your original step function code, ready to execute
Artifact inspection - Helper cells for examining data

The notebook’s kernel is automatically configured to use the recreated Conda environment, ensuring all dependencies are available.

Escape Trampolines

The _escape_trampolines directory contains Python modules that override Metaflow internals to enable artifact access outside of a running task. This is generated using:

from metaflow.plugins.env_escape import generate_trampolines
generate_trampolines(trampoline_dir)

Using the Debug Notebook

Once the debug environment is set up, you can start debugging:

Launching Jupyter

cd ~/notebooks/debug_task
jupyter notebook debug.ipynb

The notebook will automatically use the correct kernel with the recreated environment.

Accessing Artifacts

In the notebook, artifacts are accessible through the self object:

# Access input artifacts
print(self.input['n_estimators'])
print(self.input['learning_rate'])

# Access data artifacts
print(self.features.shape)
print(self.labels.shape)

# Access any artifact from the task
print(self.index)

Re-executing Step Code

You can re-run your step code with modifications:

# Original step code
from sklearn import ensemble
from sklearn.model_selection import cross_val_score
import numpy as np

estimator = ensemble.GradientBoostingRegressor(
    n_estimators=self.input['n_estimators'],
    learning_rate=self.input['learning_rate'],
    max_depth=self.input['max_depth'],
    min_samples_split=2,
    loss='ls'
)

estimator.fit(self.features, self.labels)

mses = cross_val_score(
    estimator, self.features, self.labels,
    cv=5, scoring='neg_mean_squared_error'
)
rmse = np.sqrt(-mses).mean()

print(f"RMSE: {rmse}")

Experimenting with Parameters

Try different hyperparameters without re-running the entire flow:

# Experiment with different min_samples_split
for min_samples in [2, 3, 5, 10]:
    estimator = ensemble.GradientBoostingRegressor(
        n_estimators=self.input['n_estimators'],
        learning_rate=self.input['learning_rate'],
        max_depth=self.input['max_depth'],
        min_samples_split=min_samples,
        loss='ls'
    )
    estimator.fit(self.features, self.labels)
    mses = cross_val_score(
        estimator, self.features, self.labels,
        cv=5, scoring='neg_mean_squared_error'
    )
    rmse = np.sqrt(-mses).mean()
    print(f"min_samples_split={min_samples}: RMSE={rmse:.4f}")

Environment Configuration

The extension automatically configures the Python environment:

Environment Creation

From debug_utils.py:148-171:

conda_command = [
    "metaflow",
    "environment",
    "--quiet",
    "create",
    "--name",
    mf_env_name,
    "--install-notebook",
    "--force",
]
conda_command.extend(mf_env)

result = subprocess.run(
    conda_command,
    check=True,
    capture_output=True,
)

# Parse the python path from stderr
for line in result.stderr.decode().split("\n"):
    if "/bin/python" in line:
        mf_python_path = line
        break

Jupyter Kernel Configuration

The extension updates the Jupyter kernel’s configuration to include:

The escape trampolines in PYTHONPATH
Any environment variables needed by the packaged Metaflow version

From debug_cmd.py:315-349:

def _update_kernel_pythonpath(kernelspec_path, metaflow_root_dir):
    kernel_json_path = os.path.join(kernelspec_path, "kernel.json")
    with open(kernel_json_path, "r") as f:
        kernel_json = json.load(f)

    _ = kernel_json.setdefault("env", {})["PYTHONPATH"] = os.path.abspath(
        os.path.join(metaflow_root_dir, "_escape_trampolines")
    )

    for key, value in MetaflowCodeContent.get_env_vars_for_packaged_metaflow(
        metaflow_root_dir
    ).items():
        if key.endswith(":"):
            # Override existing value
            kernel_json["env"][key[:-1]] = value
        elif key not in kernel_json["env"]:
            kernel_json["env"][key] = value
        else:
            # Prepend to existing value
            kernel_json["env"][key] = f"{value}:{kernel_json['env'][key]}"

Limitations

The Debug Extension has the following limitations:

Only supports tasks executed remotely (not local runs)
Requires the new Conda V2 decorator (included in this package)
Only works with tasks that have a code package in metadata

Troubleshooting

Error: “Task does not have code-package”

This error occurs when trying to debug a task that was run locally. The debug extension requires a code package, which is only created for remote executions. Solution: Only debug tasks executed on remote compute (e.g., with @batch, @kubernetes, or other remote decorators).

Error: “Conda environment type not supported”

This error occurs when the task used an older version of the Conda decorator. Solution: Re-run your flow with the enhanced Conda V2 decorator included in this package.

Kernel Not Found

If Jupyter cannot find the kernel:

Check that the environment was created: metaflow environment list
Verify the kernel was installed: jupyter kernelspec list
Manually specify the kernel in Jupyter if needed

Missing Dependencies

If dependencies are missing in the notebook:

Verify the correct environment is active: Check the kernel name in Jupyter
Check environment creation logs for errors
Try recreating the environment with --force

Next Steps

See Examples

Explore complete working examples of debugging different scenarios

Get Started

Conda v2

Debug Extension

Guides

​Command Syntax

​Basic Usage

​Command Options

​Required Arguments

​PATHSPEC

​--metaflow-root-dir

​Optional Flags

​--inspect

​--override-env

​--override-env-from-pathspec

​--quiet / --no-quiet

​Pathspec Resolution

​Resolution Logic

​Examples

​Generated Files and Artifacts

​Directory Structure

​Debug Script

​Debug Notebook

​Escape Trampolines

​Using the Debug Notebook

​Launching Jupyter

​Accessing Artifacts

​Re-executing Step Code

​Experimenting with Parameters

​Environment Configuration

​Environment Creation

​Jupyter Kernel Configuration

​Limitations

​Troubleshooting

​Error: “Task does not have code-package”

​Error: “Conda environment type not supported”

​Kernel Not Found

​Missing Dependencies

​Next Steps

See Examples

Build docs developers (and LLMs) love

Command Syntax

Basic Usage

Command Options

Required Arguments

`PATHSPEC`

`--metaflow-root-dir`

Optional Flags

`--inspect`

`--override-env`

`--override-env-from-pathspec`

`--quiet` / `--no-quiet`

Pathspec Resolution

Resolution Logic

Examples

Generated Files and Artifacts

Directory Structure

Debug Script

Debug Notebook

Escape Trampolines

Using the Debug Notebook

Launching Jupyter

Accessing Artifacts

Re-executing Step Code

Experimenting with Parameters

Environment Configuration

Environment Creation

Jupyter Kernel Configuration

Limitations

Troubleshooting

Error: “Task does not have code-package”

Error: “Conda environment type not supported”

Kernel Not Found

Missing Dependencies

Next Steps