Skip to main content

Basic Debugging Example

This example shows how to debug a machine learning step that failed during model training.

The Original Flow

houseprice_flow.py
from metaflow import FlowSpec, step, conda

class HousePricePredictionFlow(FlowSpec):
    
    @conda(libraries={'scikit-learn': '1.0.2', 'numpy': '1.21.0'})
    @step
    def start(self):
        """Load and prepare data"""
        import numpy as np
        from sklearn.datasets import fetch_california_housing
        
        data = fetch_california_housing()
        self.features = data.data
        self.labels = data.target
        self.next(self.train_models, foreach='params')
    
    @property
    def params(self):
        return [
            {'n_estimators': 50, 'learning_rate': 0.1, 'max_depth': 3},
            {'n_estimators': 100, 'learning_rate': 0.05, 'max_depth': 5},
            {'n_estimators': 200, 'learning_rate': 0.01, 'max_depth': 7},
        ]
    
    @conda(libraries={'scikit-learn': '1.0.2', 'numpy': '1.21.0'})
    @step
    def fit_gbrt_for_given_param(self):
        """Fit GBRT with given parameters"""
        from sklearn import ensemble
        from sklearn.model_selection import cross_val_score
        import numpy as np
        
        estimator = ensemble.GradientBoostingRegressor(
            n_estimators=self.input['n_estimators'],
            learning_rate=self.input['learning_rate'],
            max_depth=self.input['max_depth'],
            min_samples_split=2,
            loss='ls'
        )
        
        estimator.fit(self.features, self.labels)
        
        mses = cross_val_score(
            estimator, self.features, self.labels,
            cv=5, scoring='neg_mean_squared_error'
        )
        rmse = np.sqrt(-mses).mean()
        
        self.fit = dict(
            index=int(self.index),
            params=self.input,
            rmse=rmse,
            estimator=estimator
        )
        
        self.next(self.select_best_model)
    
    @step
    def select_best_model(self, inputs):
        """Select the best model based on RMSE"""
        self.best_model = min(inputs, key=lambda x: x.fit['rmse'])
        self.next(self.end)
    
    @step
    def end(self):
        """End of flow"""
        print(f"Best RMSE: {self.best_model.fit['rmse']}")

if __name__ == '__main__':
    HousePricePredictionFlow()

Debugging a Specific Task

Suppose the task with pathspec HousePricePredictionFlow/1199/fit_gbrt_for_given_param/150671013 produced unexpected results. Here’s how to debug it:
1

Run the debug command

metaflow debug task HousePricePredictionFlow/1199/fit_gbrt_for_given_param/150671013 \
  --metaflow-root-dir ~/notebooks/debug_task
This will:
  • Download the code package
  • Recreate the Conda environment
  • Generate debug scripts and notebook
2

Launch Jupyter

cd ~/notebooks/debug_task
jupyter notebook debug.ipynb
3

Access artifacts in the notebook

# The 'self' object gives you access to all artifacts
print("Parameters used:", self.input)
print("Features shape:", self.features.shape)
print("Labels shape:", self.labels.shape)
print("Task index:", self.index)
4

Re-execute the step code

# Import the required libraries (available in the Conda env)
from sklearn import ensemble
from sklearn.model_selection import cross_val_score
import numpy as np

# Re-run with the same parameters
estimator = ensemble.GradientBoostingRegressor(
    n_estimators=self.input['n_estimators'],
    learning_rate=self.input['learning_rate'],
    max_depth=self.input['max_depth'],
    min_samples_split=2,
    loss='ls'
)

estimator.fit(self.features, self.labels)

mses = cross_val_score(
    estimator, self.features, self.labels,
    cv=5, scoring='neg_mean_squared_error'
)
rmse = np.sqrt(-mses).mean()

print(f"RMSE: {rmse}")

Using Different Pathspec Formats

The debug command accepts various pathspec formats for convenience.

Full Task Pathspec

Most explicit - specifies exactly which task to debug:
metaflow debug task HousePricePredictionFlow/1199/fit_gbrt_for_given_param/150671013 \
  --metaflow-root-dir ~/debug/full

Step Pathspec (Single Task)

If a step has only one task, you can omit the task ID:
# For a step without foreach - automatically resolves to the single task
metaflow debug task HousePricePredictionFlow/1199/start \
  --metaflow-root-dir ~/debug/step
This will fail if the step has multiple tasks (e.g., from a foreach). In that case, you must specify the full pathspec.

Run Pathspec

Debug the end step of a specific run:
# Automatically uses the 'end' step
metaflow debug task HousePricePredictionFlow/1199 \
  --metaflow-root-dir ~/debug/run

Flow Name Only

Debug the end step of the latest successful run:
# Uses latest successful run in your namespace
metaflow debug task HousePricePredictionFlow \
  --metaflow-root-dir ~/debug/latest
This only works if there is at least one successful run in your namespace. Otherwise, you’ll get an error.

Debugging a Failed Step

When a step fails, you can use the debug extension to investigate the failure.

Scenario: Division by Zero Error

failing_flow.py
from metaflow import FlowSpec, step, conda

class DataProcessingFlow(FlowSpec):
    
    @conda(libraries={'pandas': '1.3.0'})
    @step
    def start(self):
        import pandas as pd
        self.data = pd.DataFrame({
            'value': [100, 200, 0, 300],
            'divisor': [10, 20, 0, 30]  # Note: zero divisor!
        })
        self.next(self.process)
    
    @conda(libraries={'pandas': '1.3.0'})
    @step
    def process(self):
        import pandas as pd
        # This will fail on the third row
        self.data['result'] = self.data['value'] / self.data['divisor']
        self.next(self.end)
    
    @step
    def end(self):
        print(self.data)

if __name__ == '__main__':
    DataProcessingFlow()

Debugging the Failure

1

Identify the failed task

# After the flow fails, note the pathspec
# Example: DataProcessingFlow/1234/process/567890
2

Debug the failed task

metaflow debug task DataProcessingFlow/1234/process/567890 \
  --metaflow-root-dir ~/debug/failure
3

Investigate in the notebook

# Check the input data
print(self.data)
# Output:
#    value  divisor
# 0    100       10
# 1    200       20
# 2      0        0  <- Problem row!
# 3    300       30

# Test the failing operation
import pandas as pd
try:
    result = self.data['value'] / self.data['divisor']
    print(result)
except Exception as e:
    print(f"Error: {e}")
4

Test a fix

# Test a fix that handles zero divisors
import pandas as pd
import numpy as np

self.data['result'] = np.where(
    self.data['divisor'] == 0,
    np.nan,  # or some default value
    self.data['value'] / self.data['divisor']
)

print(self.data)
# Output:
#    value  divisor  result
# 0    100       10    10.0
# 1    200       20    10.0
# 2      0        0     NaN
# 3    300       30    10.0

Inspecting Artifacts

Use the --inspect flag to examine the final state of a completed task.

Example: Inspecting Model Artifacts

metaflow debug task ModelTrainingFlow/5678/train/123456 \
  --metaflow-root-dir ~/debug/inspect \
  --inspect
In the generated notebook:
# Access the trained model
print(type(self.fit['estimator']))
print(self.fit['params'])
print(f"RMSE: {self.fit['rmse']}")

# Examine model internals
print("Feature importances:", self.fit['estimator'].feature_importances_)
print("Number of estimators:", len(self.fit['estimator'].estimators_))

# Make predictions on the training data
import numpy as np
predictions = self.fit['estimator'].predict(self.features)
residuals = self.labels - predictions
print(f"Mean residual: {np.mean(residuals)}")
print(f"Std residual: {np.std(residuals)}")

# Visualize
import matplotlib.pyplot as plt
plt.scatter(predictions, self.labels, alpha=0.5)
plt.plot([predictions.min(), predictions.max()], 
         [predictions.min(), predictions.max()], 'r--', lw=2)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Predictions vs Actual')
plt.show()

Experimenting with Hyperparameters

Use the debug extension to quickly test different hyperparameters on the same data.
# In the debug notebook
from sklearn import ensemble
from sklearn.model_selection import cross_val_score
import numpy as np
import pandas as pd

results = []

# Test different combinations
for n_est in [50, 100, 200]:
    for lr in [0.01, 0.05, 0.1]:
        for depth in [3, 5, 7]:
            estimator = ensemble.GradientBoostingRegressor(
                n_estimators=n_est,
                learning_rate=lr,
                max_depth=depth,
                min_samples_split=2,
                loss='ls'
            )
            
            estimator.fit(self.features, self.labels)
            
            mses = cross_val_score(
                estimator, self.features, self.labels,
                cv=5, scoring='neg_mean_squared_error'
            )
            rmse = np.sqrt(-mses).mean()
            
            results.append({
                'n_estimators': n_est,
                'learning_rate': lr,
                'max_depth': depth,
                'rmse': rmse
            })
            print(f"n_est={n_est}, lr={lr}, depth={depth}: RMSE={rmse:.4f}")

# Find the best combination
results_df = pd.DataFrame(results)
print("\nBest parameters:")
print(results_df.loc[results_df['rmse'].idxmin()])

Using Environment Overrides

Override with a Named Environment

Use a custom environment instead of the task’s original environment:
1

Create a named environment

# Create an environment with different package versions
metaflow environment create \
  --name debug_env \
  --install-notebook \
  -r requirements.txt
2

Debug with the override

metaflow debug task FlowName/123/step/456 \
  --metaflow-root-dir ~/debug/override \
  --override-env debug_env
This is useful when:
  • You want to test a fix with updated dependencies
  • You need additional debugging tools not in the original environment
  • You want to use a consistent environment across multiple debug sessions

Override with Another Task’s Environment

Use the environment from a different task:
# Debug task 456 using the environment from task 789
metaflow debug task FlowName/123/step_a/456 \
  --metaflow-root-dir ~/debug/env_swap \
  --override-env-from-pathspec FlowName/100/step_b/789
This is useful when:
  • You want to test how a task would behave with different dependencies
  • You’re investigating environment-specific issues
  • You want to use a known-good environment for debugging

Advanced: Debugging Foreach Steps

When debugging steps with foreach, you must specify the full pathspec including the task ID.
foreach_flow.py
from metaflow import FlowSpec, step, conda

class ParallelProcessingFlow(FlowSpec):
    
    @step
    def start(self):
        self.datasets = ['dataset_a', 'dataset_b', 'dataset_c']
        self.next(self.process, foreach='datasets')
    
    @conda(libraries={'pandas': '1.3.0'})
    @step
    def process(self):
        import pandas as pd
        self.dataset_name = self.input
        # Process the dataset
        self.result = f"Processed {self.dataset_name}"
        self.next(self.join)
    
    @step
    def join(self, inputs):
        self.results = [inp.result for inp in inputs]
        self.next(self.end)
    
    @step
    def end(self):
        print(self.results)

if __name__ == '__main__':
    ParallelProcessingFlow()

Debugging a Specific Foreach Task

# This will fail - step has 3 tasks
metaflow debug task ParallelProcessingFlow/999/process \
  --metaflow-root-dir ~/debug/foreach
# Error: Step does not refer to a single task

# Must specify which task
metaflow debug task ParallelProcessingFlow/999/process/111 \
  --metaflow-root-dir ~/debug/foreach_a

metaflow debug task ParallelProcessingFlow/999/process/222 \
  --metaflow-root-dir ~/debug/foreach_b

metaflow debug task ParallelProcessingFlow/999/process/333 \
  --metaflow-root-dir ~/debug/foreach_c
In each debug notebook:
# Check which dataset this task processed
print(f"Processing: {self.dataset_name}")
print(f"Task index: {self.index}")
print(f"Result: {self.result}")

Debugging with Complex Dependencies

The debug extension works seamlessly with complex Conda environments.
complex_deps_flow.py
from metaflow import FlowSpec, step, conda

class VideoProcessingFlow(FlowSpec):
    
    @conda(
        libraries={'opencv-python': '4.5.3', 'pillow': '8.3.0'},
        python='3.8'
    )
    @step
    def start(self):
        import cv2
        from PIL import Image
        import numpy as np
        
        # Process video frame
        self.frame = np.random.rand(480, 640, 3) * 255
        self.frame = self.frame.astype(np.uint8)
        
        self.next(self.analyze)
    
    @conda(
        libraries={
            'tensorflow': '2.6.0',
            'opencv-python': '4.5.3'
        },
        python='3.8'
    )
    @step
    def analyze(self):
        import tensorflow as tf
        import cv2
        
        # Run inference
        self.analysis_result = "Detected objects"
        self.next(self.end)
    
    @step
    def end(self):
        print(self.analysis_result)

if __name__ == '__main__':
    VideoProcessingFlow()
Debug the analyze step:
metaflow debug task VideoProcessingFlow/777/analyze/888 \
  --metaflow-root-dir ~/debug/video
In the notebook, both TensorFlow and OpenCV are available:
import tensorflow as tf
import cv2
import numpy as np

print(f"TensorFlow version: {tf.__version__}")
print(f"OpenCV version: {cv2.__version__}")

# Access the frame from the previous step
print(f"Frame shape: {self.frame.shape}")

# Re-run analysis with different parameters
processed_frame = cv2.GaussianBlur(self.frame, (5, 5), 0)
print(f"Processed frame shape: {processed_frame.shape}")

Complete Debugging Workflow

Here’s a complete end-to-end debugging workflow:
1

Run your flow

python my_flow.py run --with batch
2

Identify the task to debug

# Get the pathspec from the output or use:
metaflow list flows
metaflow show MyFlow
3

Start debugging

metaflow debug task MyFlow/123/problematic_step/456 \
  --metaflow-root-dir ~/debug/investigation
4

Open the notebook

cd ~/debug/investigation
jupyter notebook debug.ipynb
5

Investigate and fix

  • Examine artifacts
  • Re-run code with modifications
  • Test potential fixes
  • Document findings
6

Update your flow

  • Apply the fix to your flow code
  • Re-run the flow
  • Verify the fix works

Tips and Best Practices

Best Practices for Debugging
  1. Use descriptive root directories - Name them after the issue you’re investigating
  2. Keep debug sessions organized - Use separate directories for different investigations
  3. Document your findings - Add markdown cells to the notebook with your observations
  4. Test fixes incrementally - Verify each change works before moving to the next
  5. Use version control - Commit your debug notebooks to track investigation progress
Common Pitfalls
  • Don’t forget the --metaflow-root-dir parameter - it’s required
  • Remember that only remote tasks can be debugged (not local runs)
  • When using foreach, always specify the complete task pathspec
  • Environment overrides only work with existing named environments

Build docs developers (and LLMs) love