Skip to main content

What is the Debug Extension?

The Debug Extension enables you to seamlessly debug your executed Metaflow steps in an isolated Jupyter notebook instance with all the appropriate dependencies automatically configured. This extension leverages the enhanced Conda decorator to recreate the exact environment used during step execution, making it easy to investigate failures, inspect artifacts, and iterate on your code.
The Debug Extension currently only works with the version of Conda included in this package (Conda V2). It does not support the standard OSS Metaflow Conda decorator.

Key Capabilities

The Debug Extension provides powerful debugging capabilities for your Metaflow workflows:
1

Environment Recreation

Automatically downloads and recreates the exact Conda/pip environment used during the original step execution, ensuring consistent dependency versions.
2

Code Package Retrieval

Downloads your code package from the datastore, giving you access to the exact code that was executed.
3

Artifact Access

Generates stubs to seamlessly access all artifacts and data from the run, allowing you to inspect intermediate results and debug issues.
4

Interactive Notebook

Creates a Jupyter notebook pre-configured with imports, artifact stubs, and your step code, ready for line-by-line debugging.

How It Works with Conda

The Debug Extension tightly integrates with the enhanced Conda V2 decorator included in this package:

Environment Detection

When you debug a task, the extension:
  1. Checks the environment type - Determines if the task used a Conda environment by inspecting the conda_env_id metadata
  2. Resolves the environment - If a Conda environment was used, it identifies the specific environment ID from the task metadata
  3. Recreates the environment - Uses metaflow environment create to recreate the exact environment locally
From debug_utils.py:74-96:
def fetch_environment_type(task: Task) -> str:
    """
    Fetches the conda environment type for the task.
    """
    conda_env_id = task.metadata_dict.get("conda_env_id", "").strip()
    if conda_env_id == "":
        return "default"
    elif conda_env_id.startswith("[") and conda_env_id.endswith("]"):
        return "new_conda"
    raise CommandException(
        "The conda environment type for the task is not supported. Please use the new conda decorators available in "
        "Metaflow."
    )

Named Environments

The extension creates a named environment based on the pathspec, making it easy to reuse the same environment for debugging multiple tasks from the same step:
mf_env_name = path_spec_formatted.lower()  # e.g., "flowname_123_stepname"

Environment Overrides

You can override the default environment behavior:
  • --override-env - Use a different named environment instead of recreating the task’s environment
  • --override-env-from-pathspec - Use the environment from a different task pathspec
Only one of --override-env or --override-env-from-pathspec can be specified at a time.

Architecture

The Debug Extension consists of several key components:

Command Interface

The main entry point is the metaflow debug task command, which accepts:
  • A pathspec (flow/run/step/task)
  • A root directory for generated files
  • Optional environment overrides
  • Optional inspect mode flag

Pathspec Resolution

The extension intelligently resolves partial pathspecs:
  • Flow name only → Uses the latest successful run in your namespace and the end step
  • Flow/Run → Uses the end step
  • Flow/Run/Step → Uses the unique task (errors if there are multiple tasks from a foreach)
  • Full pathspec → Uses the specified task directly

Generated Artifacts

The extension generates several files in the specified root directory:
  1. Code package - The extracted tarball of your flow code
  2. Debug script - A Python script with artifact stubs and imports
  3. Debug notebook - A Jupyter notebook pre-configured for debugging
  4. Escape trampolines - Python modules that allow accessing Metaflow internals
  5. Stub generators - Helper scripts for creating artifact access stubs

Use Cases

Debugging Failed Steps

When a step fails in production, use the debug extension to:
  • Recreate the exact environment where the failure occurred
  • Access the input artifacts that caused the failure
  • Step through the code line-by-line to identify the root cause

Inspecting Successful Runs

Even for successful runs, the debug extension helps you:
  • Examine intermediate artifacts
  • Validate model outputs
  • Experiment with different parameters using the same data

Iterative Development

During development, use the debug extension to:
  • Test changes to your step logic with real data
  • Validate that new code works with previously computed artifacts
  • Experiment with hyperparameters on actual training data

Requirements

The Debug Extension requires:
  • Metaflow v2.8.3 or later
  • The Netflix Extensions package installed
  • The enhanced Conda V2 decorator (included in this package)
  • Tasks that were executed remotely (not local runs)

Next Steps

Usage Guide

Learn the command syntax and options for debugging tasks

Examples

See complete working examples of debugging different scenarios

Build docs developers (and LLMs) love