Basic Debugging Example
This example shows how to debug a machine learning step that failed during model training.
The Original Flow
from metaflow import FlowSpec, step, conda
class HousePricePredictionFlow(FlowSpec):
@conda(libraries={'scikit-learn': '1.0.2', 'numpy': '1.21.0'})
@step
def start(self):
"""Load and prepare data"""
import numpy as np
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing()
self.features = data.data
self.labels = data.target
self.next(self.train_models, foreach='params')
@property
def params(self):
return [
{'n_estimators': 50, 'learning_rate': 0.1, 'max_depth': 3},
{'n_estimators': 100, 'learning_rate': 0.05, 'max_depth': 5},
{'n_estimators': 200, 'learning_rate': 0.01, 'max_depth': 7},
]
@conda(libraries={'scikit-learn': '1.0.2', 'numpy': '1.21.0'})
@step
def fit_gbrt_for_given_param(self):
"""Fit GBRT with given parameters"""
from sklearn import ensemble
from sklearn.model_selection import cross_val_score
import numpy as np
estimator = ensemble.GradientBoostingRegressor(
n_estimators=self.input['n_estimators'],
learning_rate=self.input['learning_rate'],
max_depth=self.input['max_depth'],
min_samples_split=2,
loss='ls'
)
estimator.fit(self.features, self.labels)
mses = cross_val_score(
estimator, self.features, self.labels,
cv=5, scoring='neg_mean_squared_error'
)
rmse = np.sqrt(-mses).mean()
self.fit = dict(
index=int(self.index),
params=self.input,
rmse=rmse,
estimator=estimator
)
self.next(self.select_best_model)
@step
def select_best_model(self, inputs):
"""Select the best model based on RMSE"""
self.best_model = min(inputs, key=lambda x: x.fit['rmse'])
self.next(self.end)
@step
def end(self):
"""End of flow"""
print(f"Best RMSE: {self.best_model.fit['rmse']}")
if __name__ == '__main__':
HousePricePredictionFlow()
Debugging a Specific Task
Suppose the task with pathspec HousePricePredictionFlow/1199/fit_gbrt_for_given_param/150671013 produced unexpected results. Here’s how to debug it:
Run the debug command
metaflow debug task HousePricePredictionFlow/1199/fit_gbrt_for_given_param/150671013 \
--metaflow-root-dir ~/notebooks/debug_task
This will:
- Download the code package
- Recreate the Conda environment
- Generate debug scripts and notebook
Launch Jupyter
cd ~/notebooks/debug_task
jupyter notebook debug.ipynb
Access artifacts in the notebook
# The 'self' object gives you access to all artifacts
print("Parameters used:", self.input)
print("Features shape:", self.features.shape)
print("Labels shape:", self.labels.shape)
print("Task index:", self.index)
Re-execute the step code
# Import the required libraries (available in the Conda env)
from sklearn import ensemble
from sklearn.model_selection import cross_val_score
import numpy as np
# Re-run with the same parameters
estimator = ensemble.GradientBoostingRegressor(
n_estimators=self.input['n_estimators'],
learning_rate=self.input['learning_rate'],
max_depth=self.input['max_depth'],
min_samples_split=2,
loss='ls'
)
estimator.fit(self.features, self.labels)
mses = cross_val_score(
estimator, self.features, self.labels,
cv=5, scoring='neg_mean_squared_error'
)
rmse = np.sqrt(-mses).mean()
print(f"RMSE: {rmse}")
The debug command accepts various pathspec formats for convenience.
Full Task Pathspec
Most explicit - specifies exactly which task to debug:
metaflow debug task HousePricePredictionFlow/1199/fit_gbrt_for_given_param/150671013 \
--metaflow-root-dir ~/debug/full
Step Pathspec (Single Task)
If a step has only one task, you can omit the task ID:
# For a step without foreach - automatically resolves to the single task
metaflow debug task HousePricePredictionFlow/1199/start \
--metaflow-root-dir ~/debug/step
This will fail if the step has multiple tasks (e.g., from a foreach). In that case, you must specify the full pathspec.
Run Pathspec
Debug the end step of a specific run:
# Automatically uses the 'end' step
metaflow debug task HousePricePredictionFlow/1199 \
--metaflow-root-dir ~/debug/run
Flow Name Only
Debug the end step of the latest successful run:
# Uses latest successful run in your namespace
metaflow debug task HousePricePredictionFlow \
--metaflow-root-dir ~/debug/latest
This only works if there is at least one successful run in your namespace. Otherwise, you’ll get an error.
Debugging a Failed Step
When a step fails, you can use the debug extension to investigate the failure.
Scenario: Division by Zero Error
from metaflow import FlowSpec, step, conda
class DataProcessingFlow(FlowSpec):
@conda(libraries={'pandas': '1.3.0'})
@step
def start(self):
import pandas as pd
self.data = pd.DataFrame({
'value': [100, 200, 0, 300],
'divisor': [10, 20, 0, 30] # Note: zero divisor!
})
self.next(self.process)
@conda(libraries={'pandas': '1.3.0'})
@step
def process(self):
import pandas as pd
# This will fail on the third row
self.data['result'] = self.data['value'] / self.data['divisor']
self.next(self.end)
@step
def end(self):
print(self.data)
if __name__ == '__main__':
DataProcessingFlow()
Debugging the Failure
Identify the failed task
# After the flow fails, note the pathspec
# Example: DataProcessingFlow/1234/process/567890
Debug the failed task
metaflow debug task DataProcessingFlow/1234/process/567890 \
--metaflow-root-dir ~/debug/failure
Investigate in the notebook
# Check the input data
print(self.data)
# Output:
# value divisor
# 0 100 10
# 1 200 20
# 2 0 0 <- Problem row!
# 3 300 30
# Test the failing operation
import pandas as pd
try:
result = self.data['value'] / self.data['divisor']
print(result)
except Exception as e:
print(f"Error: {e}")
Test a fix
# Test a fix that handles zero divisors
import pandas as pd
import numpy as np
self.data['result'] = np.where(
self.data['divisor'] == 0,
np.nan, # or some default value
self.data['value'] / self.data['divisor']
)
print(self.data)
# Output:
# value divisor result
# 0 100 10 10.0
# 1 200 20 10.0
# 2 0 0 NaN
# 3 300 30 10.0
Inspecting Artifacts
Use the --inspect flag to examine the final state of a completed task.
Example: Inspecting Model Artifacts
metaflow debug task ModelTrainingFlow/5678/train/123456 \
--metaflow-root-dir ~/debug/inspect \
--inspect
In the generated notebook:
# Access the trained model
print(type(self.fit['estimator']))
print(self.fit['params'])
print(f"RMSE: {self.fit['rmse']}")
# Examine model internals
print("Feature importances:", self.fit['estimator'].feature_importances_)
print("Number of estimators:", len(self.fit['estimator'].estimators_))
# Make predictions on the training data
import numpy as np
predictions = self.fit['estimator'].predict(self.features)
residuals = self.labels - predictions
print(f"Mean residual: {np.mean(residuals)}")
print(f"Std residual: {np.std(residuals)}")
# Visualize
import matplotlib.pyplot as plt
plt.scatter(predictions, self.labels, alpha=0.5)
plt.plot([predictions.min(), predictions.max()],
[predictions.min(), predictions.max()], 'r--', lw=2)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Predictions vs Actual')
plt.show()
Experimenting with Hyperparameters
Use the debug extension to quickly test different hyperparameters on the same data.
# In the debug notebook
from sklearn import ensemble
from sklearn.model_selection import cross_val_score
import numpy as np
import pandas as pd
results = []
# Test different combinations
for n_est in [50, 100, 200]:
for lr in [0.01, 0.05, 0.1]:
for depth in [3, 5, 7]:
estimator = ensemble.GradientBoostingRegressor(
n_estimators=n_est,
learning_rate=lr,
max_depth=depth,
min_samples_split=2,
loss='ls'
)
estimator.fit(self.features, self.labels)
mses = cross_val_score(
estimator, self.features, self.labels,
cv=5, scoring='neg_mean_squared_error'
)
rmse = np.sqrt(-mses).mean()
results.append({
'n_estimators': n_est,
'learning_rate': lr,
'max_depth': depth,
'rmse': rmse
})
print(f"n_est={n_est}, lr={lr}, depth={depth}: RMSE={rmse:.4f}")
# Find the best combination
results_df = pd.DataFrame(results)
print("\nBest parameters:")
print(results_df.loc[results_df['rmse'].idxmin()])
Using Environment Overrides
Override with a Named Environment
Use a custom environment instead of the task’s original environment:
Create a named environment
# Create an environment with different package versions
metaflow environment create \
--name debug_env \
--install-notebook \
-r requirements.txt
Debug with the override
metaflow debug task FlowName/123/step/456 \
--metaflow-root-dir ~/debug/override \
--override-env debug_env
This is useful when:
- You want to test a fix with updated dependencies
- You need additional debugging tools not in the original environment
- You want to use a consistent environment across multiple debug sessions
Override with Another Task’s Environment
Use the environment from a different task:
# Debug task 456 using the environment from task 789
metaflow debug task FlowName/123/step_a/456 \
--metaflow-root-dir ~/debug/env_swap \
--override-env-from-pathspec FlowName/100/step_b/789
This is useful when:
- You want to test how a task would behave with different dependencies
- You’re investigating environment-specific issues
- You want to use a known-good environment for debugging
Advanced: Debugging Foreach Steps
When debugging steps with foreach, you must specify the full pathspec including the task ID.
from metaflow import FlowSpec, step, conda
class ParallelProcessingFlow(FlowSpec):
@step
def start(self):
self.datasets = ['dataset_a', 'dataset_b', 'dataset_c']
self.next(self.process, foreach='datasets')
@conda(libraries={'pandas': '1.3.0'})
@step
def process(self):
import pandas as pd
self.dataset_name = self.input
# Process the dataset
self.result = f"Processed {self.dataset_name}"
self.next(self.join)
@step
def join(self, inputs):
self.results = [inp.result for inp in inputs]
self.next(self.end)
@step
def end(self):
print(self.results)
if __name__ == '__main__':
ParallelProcessingFlow()
Debugging a Specific Foreach Task
# This will fail - step has 3 tasks
metaflow debug task ParallelProcessingFlow/999/process \
--metaflow-root-dir ~/debug/foreach
# Error: Step does not refer to a single task
# Must specify which task
metaflow debug task ParallelProcessingFlow/999/process/111 \
--metaflow-root-dir ~/debug/foreach_a
metaflow debug task ParallelProcessingFlow/999/process/222 \
--metaflow-root-dir ~/debug/foreach_b
metaflow debug task ParallelProcessingFlow/999/process/333 \
--metaflow-root-dir ~/debug/foreach_c
In each debug notebook:
# Check which dataset this task processed
print(f"Processing: {self.dataset_name}")
print(f"Task index: {self.index}")
print(f"Result: {self.result}")
Debugging with Complex Dependencies
The debug extension works seamlessly with complex Conda environments.
from metaflow import FlowSpec, step, conda
class VideoProcessingFlow(FlowSpec):
@conda(
libraries={'opencv-python': '4.5.3', 'pillow': '8.3.0'},
python='3.8'
)
@step
def start(self):
import cv2
from PIL import Image
import numpy as np
# Process video frame
self.frame = np.random.rand(480, 640, 3) * 255
self.frame = self.frame.astype(np.uint8)
self.next(self.analyze)
@conda(
libraries={
'tensorflow': '2.6.0',
'opencv-python': '4.5.3'
},
python='3.8'
)
@step
def analyze(self):
import tensorflow as tf
import cv2
# Run inference
self.analysis_result = "Detected objects"
self.next(self.end)
@step
def end(self):
print(self.analysis_result)
if __name__ == '__main__':
VideoProcessingFlow()
Debug the analyze step:
metaflow debug task VideoProcessingFlow/777/analyze/888 \
--metaflow-root-dir ~/debug/video
In the notebook, both TensorFlow and OpenCV are available:
import tensorflow as tf
import cv2
import numpy as np
print(f"TensorFlow version: {tf.__version__}")
print(f"OpenCV version: {cv2.__version__}")
# Access the frame from the previous step
print(f"Frame shape: {self.frame.shape}")
# Re-run analysis with different parameters
processed_frame = cv2.GaussianBlur(self.frame, (5, 5), 0)
print(f"Processed frame shape: {processed_frame.shape}")
Complete Debugging Workflow
Here’s a complete end-to-end debugging workflow:
Run your flow
python my_flow.py run --with batch
Identify the task to debug
# Get the pathspec from the output or use:
metaflow list flows
metaflow show MyFlow
Start debugging
metaflow debug task MyFlow/123/problematic_step/456 \
--metaflow-root-dir ~/debug/investigation
Open the notebook
cd ~/debug/investigation
jupyter notebook debug.ipynb
Investigate and fix
- Examine artifacts
- Re-run code with modifications
- Test potential fixes
- Document findings
Update your flow
- Apply the fix to your flow code
- Re-run the flow
- Verify the fix works
Tips and Best Practices
Best Practices for Debugging
- Use descriptive root directories - Name them after the issue you’re investigating
- Keep debug sessions organized - Use separate directories for different investigations
- Document your findings - Add markdown cells to the notebook with your observations
- Test fixes incrementally - Verify each change works before moving to the next
- Use version control - Commit your debug notebooks to track investigation progress
Common Pitfalls
- Don’t forget the
--metaflow-root-dir parameter - it’s required
- Remember that only remote tasks can be debugged (not local runs)
- When using foreach, always specify the complete task pathspec
- Environment overrides only work with existing named environments