Examples

Basic Debugging Example

This example shows how to debug a machine learning step that failed during model training.

The Original Flow

houseprice_flow.py

from metaflow import FlowSpec, step, conda

class HousePricePredictionFlow(FlowSpec):
    
    @conda(libraries={'scikit-learn': '1.0.2', 'numpy': '1.21.0'})
    @step
    def start(self):
        """Load and prepare data"""
        import numpy as np
        from sklearn.datasets import fetch_california_housing
        
        data = fetch_california_housing()
        self.features = data.data
        self.labels = data.target
        self.next(self.train_models, foreach='params')
    
    @property
    def params(self):
        return [
            {'n_estimators': 50, 'learning_rate': 0.1, 'max_depth': 3},
            {'n_estimators': 100, 'learning_rate': 0.05, 'max_depth': 5},
            {'n_estimators': 200, 'learning_rate': 0.01, 'max_depth': 7},
        ]
    
    @conda(libraries={'scikit-learn': '1.0.2', 'numpy': '1.21.0'})
    @step
    def fit_gbrt_for_given_param(self):
        """Fit GBRT with given parameters"""
        from sklearn import ensemble
        from sklearn.model_selection import cross_val_score
        import numpy as np
        
        estimator = ensemble.GradientBoostingRegressor(
            n_estimators=self.input['n_estimators'],
            learning_rate=self.input['learning_rate'],
            max_depth=self.input['max_depth'],
            min_samples_split=2,
            loss='ls'
        )
        
        estimator.fit(self.features, self.labels)
        
        mses = cross_val_score(
            estimator, self.features, self.labels,
            cv=5, scoring='neg_mean_squared_error'
        )
        rmse = np.sqrt(-mses).mean()
        
        self.fit = dict(
            index=int(self.index),
            params=self.input,
            rmse=rmse,
            estimator=estimator
        )
        
        self.next(self.select_best_model)
    
    @step
    def select_best_model(self, inputs):
        """Select the best model based on RMSE"""
        self.best_model = min(inputs, key=lambda x: x.fit['rmse'])
        self.next(self.end)
    
    @step
    def end(self):
        """End of flow"""
        print(f"Best RMSE: {self.best_model.fit['rmse']}")

if __name__ == '__main__':
    HousePricePredictionFlow()

Debugging a Specific Task

Suppose the task with pathspec HousePricePredictionFlow/1199/fit_gbrt_for_given_param/150671013 produced unexpected results. Here’s how to debug it:

Run the debug command

metaflow debug task HousePricePredictionFlow/1199/fit_gbrt_for_given_param/150671013 \
  --metaflow-root-dir ~/notebooks/debug_task

This will:

Download the code package
Recreate the Conda environment
Generate debug scripts and notebook

Launch Jupyter

cd ~/notebooks/debug_task
jupyter notebook debug.ipynb

Access artifacts in the notebook

# The 'self' object gives you access to all artifacts
print("Parameters used:", self.input)
print("Features shape:", self.features.shape)
print("Labels shape:", self.labels.shape)
print("Task index:", self.index)

Re-execute the step code

# Import the required libraries (available in the Conda env)
from sklearn import ensemble
from sklearn.model_selection import cross_val_score
import numpy as np

# Re-run with the same parameters
estimator = ensemble.GradientBoostingRegressor(
    n_estimators=self.input['n_estimators'],
    learning_rate=self.input['learning_rate'],
    max_depth=self.input['max_depth'],
    min_samples_split=2,
    loss='ls'
)

estimator.fit(self.features, self.labels)

mses = cross_val_score(
    estimator, self.features, self.labels,
    cv=5, scoring='neg_mean_squared_error'
)
rmse = np.sqrt(-mses).mean()

print(f"RMSE: {rmse}")

Using Different Pathspec Formats

The debug command accepts various pathspec formats for convenience.

Full Task Pathspec

Most explicit - specifies exactly which task to debug:

metaflow debug task HousePricePredictionFlow/1199/fit_gbrt_for_given_param/150671013 \
  --metaflow-root-dir ~/debug/full

Step Pathspec (Single Task)

If a step has only one task, you can omit the task ID:

# For a step without foreach - automatically resolves to the single task
metaflow debug task HousePricePredictionFlow/1199/start \
  --metaflow-root-dir ~/debug/step

This will fail if the step has multiple tasks (e.g., from a foreach). In that case, you must specify the full pathspec.

Run Pathspec

Debug the end step of a specific run:

# Automatically uses the 'end' step
metaflow debug task HousePricePredictionFlow/1199 \
  --metaflow-root-dir ~/debug/run

Flow Name Only

Debug the end step of the latest successful run:

# Uses latest successful run in your namespace
metaflow debug task HousePricePredictionFlow \
  --metaflow-root-dir ~/debug/latest

This only works if there is at least one successful run in your namespace. Otherwise, you’ll get an error.

Debugging a Failed Step

When a step fails, you can use the debug extension to investigate the failure.

Scenario: Division by Zero Error

failing_flow.py

from metaflow import FlowSpec, step, conda

class DataProcessingFlow(FlowSpec):
    
    @conda(libraries={'pandas': '1.3.0'})
    @step
    def start(self):
        import pandas as pd
        self.data = pd.DataFrame({
            'value': [100, 200, 0, 300],
            'divisor': [10, 20, 0, 30]  # Note: zero divisor!
        })
        self.next(self.process)
    
    @conda(libraries={'pandas': '1.3.0'})
    @step
    def process(self):
        import pandas as pd
        # This will fail on the third row
        self.data['result'] = self.data['value'] / self.data['divisor']
        self.next(self.end)
    
    @step
    def end(self):
        print(self.data)

if __name__ == '__main__':
    DataProcessingFlow()

Debugging the Failure

Identify the failed task

# After the flow fails, note the pathspec
# Example: DataProcessingFlow/1234/process/567890

Debug the failed task

metaflow debug task DataProcessingFlow/1234/process/567890 \
  --metaflow-root-dir ~/debug/failure

Investigate in the notebook

# Check the input data
print(self.data)
# Output:
#    value  divisor
# 0    100       10
# 1    200       20
# 2      0        0  <- Problem row!
# 3    300       30

# Test the failing operation
import pandas as pd
try:
    result = self.data['value'] / self.data['divisor']
    print(result)
except Exception as e:
    print(f"Error: {e}")

Test a fix

# Test a fix that handles zero divisors
import pandas as pd
import numpy as np

self.data['result'] = np.where(
    self.data['divisor'] == 0,
    np.nan,  # or some default value
    self.data['value'] / self.data['divisor']
)

print(self.data)
# Output:
#    value  divisor  result
# 0    100       10    10.0
# 1    200       20    10.0
# 2      0        0     NaN
# 3    300       30    10.0

Inspecting Artifacts

Use the --inspect flag to examine the final state of a completed task.

Example: Inspecting Model Artifacts

metaflow debug task ModelTrainingFlow/5678/train/123456 \
  --metaflow-root-dir ~/debug/inspect \
  --inspect

In the generated notebook:

# Access the trained model
print(type(self.fit['estimator']))
print(self.fit['params'])
print(f"RMSE: {self.fit['rmse']}")

# Examine model internals
print("Feature importances:", self.fit['estimator'].feature_importances_)
print("Number of estimators:", len(self.fit['estimator'].estimators_))

# Make predictions on the training data
import numpy as np
predictions = self.fit['estimator'].predict(self.features)
residuals = self.labels - predictions
print(f"Mean residual: {np.mean(residuals)}")
print(f"Std residual: {np.std(residuals)}")

# Visualize
import matplotlib.pyplot as plt
plt.scatter(predictions, self.labels, alpha=0.5)
plt.plot([predictions.min(), predictions.max()], 
         [predictions.min(), predictions.max()], 'r--', lw=2)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Predictions vs Actual')
plt.show()

Experimenting with Hyperparameters

Use the debug extension to quickly test different hyperparameters on the same data.

# In the debug notebook
from sklearn import ensemble
from sklearn.model_selection import cross_val_score
import numpy as np
import pandas as pd

results = []

# Test different combinations
for n_est in [50, 100, 200]:
    for lr in [0.01, 0.05, 0.1]:
        for depth in [3, 5, 7]:
            estimator = ensemble.GradientBoostingRegressor(
                n_estimators=n_est,
                learning_rate=lr,
                max_depth=depth,
                min_samples_split=2,
                loss='ls'
            )
            
            estimator.fit(self.features, self.labels)
            
            mses = cross_val_score(
                estimator, self.features, self.labels,
                cv=5, scoring='neg_mean_squared_error'
            )
            rmse = np.sqrt(-mses).mean()
            
            results.append({
                'n_estimators': n_est,
                'learning_rate': lr,
                'max_depth': depth,
                'rmse': rmse
            })
            print(f"n_est={n_est}, lr={lr}, depth={depth}: RMSE={rmse:.4f}")

# Find the best combination
results_df = pd.DataFrame(results)
print("\nBest parameters:")
print(results_df.loc[results_df['rmse'].idxmin()])

Using Environment Overrides

Override with a Named Environment

Use a custom environment instead of the task’s original environment:

Create a named environment

# Create an environment with different package versions
metaflow environment create \
  --name debug_env \
  --install-notebook \
  -r requirements.txt

Debug with the override

metaflow debug task FlowName/123/step/456 \
  --metaflow-root-dir ~/debug/override \
  --override-env debug_env

This is useful when:

You want to test a fix with updated dependencies
You need additional debugging tools not in the original environment
You want to use a consistent environment across multiple debug sessions

Override with Another Task’s Environment

Use the environment from a different task:

# Debug task 456 using the environment from task 789
metaflow debug task FlowName/123/step_a/456 \
  --metaflow-root-dir ~/debug/env_swap \
  --override-env-from-pathspec FlowName/100/step_b/789

This is useful when:

You want to test how a task would behave with different dependencies
You’re investigating environment-specific issues
You want to use a known-good environment for debugging

Advanced: Debugging Foreach Steps

When debugging steps with foreach, you must specify the full pathspec including the task ID.

foreach_flow.py

from metaflow import FlowSpec, step, conda

class ParallelProcessingFlow(FlowSpec):
    
    @step
    def start(self):
        self.datasets = ['dataset_a', 'dataset_b', 'dataset_c']
        self.next(self.process, foreach='datasets')
    
    @conda(libraries={'pandas': '1.3.0'})
    @step
    def process(self):
        import pandas as pd
        self.dataset_name = self.input
        # Process the dataset
        self.result = f"Processed {self.dataset_name}"
        self.next(self.join)
    
    @step
    def join(self, inputs):
        self.results = [inp.result for inp in inputs]
        self.next(self.end)
    
    @step
    def end(self):
        print(self.results)

if __name__ == '__main__':
    ParallelProcessingFlow()

Debugging a Specific Foreach Task

# This will fail - step has 3 tasks
metaflow debug task ParallelProcessingFlow/999/process \
  --metaflow-root-dir ~/debug/foreach
# Error: Step does not refer to a single task

# Must specify which task
metaflow debug task ParallelProcessingFlow/999/process/111 \
  --metaflow-root-dir ~/debug/foreach_a

metaflow debug task ParallelProcessingFlow/999/process/222 \
  --metaflow-root-dir ~/debug/foreach_b

metaflow debug task ParallelProcessingFlow/999/process/333 \
  --metaflow-root-dir ~/debug/foreach_c

In each debug notebook:

# Check which dataset this task processed
print(f"Processing: {self.dataset_name}")
print(f"Task index: {self.index}")
print(f"Result: {self.result}")

Debugging with Complex Dependencies

The debug extension works seamlessly with complex Conda environments.

complex_deps_flow.py

from metaflow import FlowSpec, step, conda

class VideoProcessingFlow(FlowSpec):
    
    @conda(
        libraries={'opencv-python': '4.5.3', 'pillow': '8.3.0'},
        python='3.8'
    )
    @step
    def start(self):
        import cv2
        from PIL import Image
        import numpy as np
        
        # Process video frame
        self.frame = np.random.rand(480, 640, 3) * 255
        self.frame = self.frame.astype(np.uint8)
        
        self.next(self.analyze)
    
    @conda(
        libraries={
            'tensorflow': '2.6.0',
            'opencv-python': '4.5.3'
        },
        python='3.8'
    )
    @step
    def analyze(self):
        import tensorflow as tf
        import cv2
        
        # Run inference
        self.analysis_result = "Detected objects"
        self.next(self.end)
    
    @step
    def end(self):
        print(self.analysis_result)

if __name__ == '__main__':
    VideoProcessingFlow()

Debug the analyze step:

metaflow debug task VideoProcessingFlow/777/analyze/888 \
  --metaflow-root-dir ~/debug/video

In the notebook, both TensorFlow and OpenCV are available:

import tensorflow as tf
import cv2
import numpy as np

print(f"TensorFlow version: {tf.__version__}")
print(f"OpenCV version: {cv2.__version__}")

# Access the frame from the previous step
print(f"Frame shape: {self.frame.shape}")

# Re-run analysis with different parameters
processed_frame = cv2.GaussianBlur(self.frame, (5, 5), 0)
print(f"Processed frame shape: {processed_frame.shape}")

Complete Debugging Workflow

Here’s a complete end-to-end debugging workflow:

Run your flow

python my_flow.py run --with batch

Identify the task to debug

# Get the pathspec from the output or use:
metaflow list flows
metaflow show MyFlow

Start debugging

metaflow debug task MyFlow/123/problematic_step/456 \
  --metaflow-root-dir ~/debug/investigation

Open the notebook

cd ~/debug/investigation
jupyter notebook debug.ipynb

Investigate and fix

Examine artifacts
Re-run code with modifications
Test potential fixes
Document findings

Update your flow

Apply the fix to your flow code
Re-run the flow
Verify the fix works

Tips and Best Practices

Best Practices for Debugging

Use descriptive root directories - Name them after the issue you’re investigating
Keep debug sessions organized - Use separate directories for different investigations
Document your findings - Add markdown cells to the notebook with your observations
Test fixes incrementally - Verify each change works before moving to the next
Use version control - Commit your debug notebooks to track investigation progress

Common Pitfalls

Don’t forget the --metaflow-root-dir parameter - it’s required
Remember that only remote tasks can be debugged (not local runs)
When using foreach, always specify the complete task pathspec
Environment overrides only work with existing named environments

Get Started

Conda v2

Debug Extension

Guides

Basic Debugging Example

The Original Flow

Debugging a Specific Task

Using Different Pathspec Formats

Full Task Pathspec

Step Pathspec (Single Task)

Run Pathspec

Flow Name Only

Debugging a Failed Step

Scenario: Division by Zero Error

Debugging the Failure

Inspecting Artifacts

Example: Inspecting Model Artifacts

Experimenting with Hyperparameters

Using Environment Overrides

Override with a Named Environment

Override with Another Task’s Environment

Advanced: Debugging Foreach Steps

Debugging a Specific Foreach Task

Debugging with Complex Dependencies

Complete Debugging Workflow

Tips and Best Practices

Build docs developers (and LLMs) love

Get Started

Conda v2

Debug Extension

Guides

​Basic Debugging Example

​The Original Flow

​Debugging a Specific Task

​Using Different Pathspec Formats

​Full Task Pathspec

​Step Pathspec (Single Task)

​Run Pathspec

​Flow Name Only

​Debugging a Failed Step

​Scenario: Division by Zero Error

​Debugging the Failure

​Inspecting Artifacts

​Example: Inspecting Model Artifacts

​Experimenting with Hyperparameters

​Using Environment Overrides

​Override with a Named Environment

​Override with Another Task’s Environment

​Advanced: Debugging Foreach Steps

​Debugging a Specific Foreach Task

​Debugging with Complex Dependencies

​Complete Debugging Workflow

​Tips and Best Practices

Build docs developers (and LLMs) love

Basic Debugging Example

The Original Flow

Debugging a Specific Task

Using Different Pathspec Formats

Full Task Pathspec

Step Pathspec (Single Task)

Run Pathspec

Flow Name Only

Debugging a Failed Step

Scenario: Division by Zero Error

Debugging the Failure

Inspecting Artifacts

Example: Inspecting Model Artifacts

Experimenting with Hyperparameters

Using Environment Overrides

Override with a Named Environment

Override with Another Task’s Environment

Advanced: Debugging Foreach Steps

Debugging a Specific Foreach Task

Debugging with Complex Dependencies

Complete Debugging Workflow

Tips and Best Practices