Skip to main content

Overview

The Backtesting module provides comprehensive tools for evaluating machine learning model performance with advanced metrics and statistical analysis.

Backtester Class

Constructor

Backtester()
Initializes a new Backtester instance. Returns: Backtester instance Example:
from utils.backtesting import Backtester

backtester = Backtester()

Methods

calculate_metrics

calculate_metrics(actual: np.ndarray, predicted: np.ndarray) -> Dict
Calculates comprehensive evaluation metrics comparing actual vs predicted values. Parameters:
  • actual (np.ndarray): Array of actual/true values
  • predicted (np.ndarray): Array of predicted values from the model
Returns: Dict containing the following metrics:
MetricTypeDescription
MAEfloatMean Absolute Error
RMSEfloatRoot Mean Squared Error
MAPEfloatMean Absolute Percentage Error (%)
Median_APEfloatMedian Absolute Percentage Error (%)
R2_ScorefloatR-squared coefficient of determination
Direction_AccuracyfloatAccuracy of predicted direction changes (%)
Max_ErrorfloatMaximum absolute error
Mean_ActualfloatMean of actual values
Mean_PredictedfloatMean of predicted values
Std_ActualfloatStandard deviation of actual values
Std_PredictedfloatStandard deviation of predicted values
Example:
import numpy as np
from utils.backtesting import Backtester

# Sample data
actual_prices = np.array([100, 102, 105, 103, 108])
predicted_prices = np.array([101, 103, 104, 105, 107])

# Calculate metrics
backtester = Backtester()
metrics = backtester.calculate_metrics(actual_prices, predicted_prices)

print(f"MAE: ${metrics['MAE']:.2f}")
print(f"RMSE: ${metrics['RMSE']:.2f}")
print(f"MAPE: {metrics['MAPE']:.2f}%")
print(f"Direction Accuracy: {metrics['Direction_Accuracy']:.2f}%")
print(f"R² Score: {metrics['R2_Score']:.4f}")
Output:
MAE: $1.40
RMSE: $1.67
MAPE: 1.35%
Direction Accuracy: 75.00%
R² Score: 0.8923

format_metrics

format_metrics(metrics: Dict) -> str
Formats metrics dictionary into a human-readable markdown string for display. Parameters:
  • metrics (Dict): Dictionary of metrics returned from calculate_metrics()
Returns: str: Formatted markdown string with key metrics Example:
from utils.backtesting import Backtester
import numpy as np

backtester = Backtester()

# Calculate metrics
actual = np.array([50000, 51000, 52000, 50500, 53000])
predicted = np.array([50500, 51200, 51800, 51000, 52800])

metrics = backtester.calculate_metrics(actual, predicted)
formatted_output = backtester.format_metrics(metrics)

print(formatted_output)
Output:
### 📊 Métricas de Evaluación

**MAE (Error Absoluto Medio):** $433.33

**RMSE (Raíz del Error Cuadrático):** $458.26

**MAPE (Error Porcentual Medio):** 0.85%

**Precisión de Dirección:** 75.00%

**R² Score:** 0.7645

Complete Usage Example

import numpy as np
import pandas as pd
from utils.backtesting import Backtester

# Initialize backtester
backtester = Backtester()

# Load your model predictions and actual values
actual_prices = np.array([
    45000, 46000, 47500, 46800, 48000, 
    49200, 48500, 50000, 51000, 50500
])

predicted_prices = np.array([
    45500, 46200, 47000, 47200, 48500,
    49000, 49100, 49800, 50800, 51200
])

# Calculate comprehensive metrics
metrics = backtester.calculate_metrics(actual_prices, predicted_prices)

# Display formatted results
print(backtester.format_metrics(metrics))

# Access individual metrics
print(f"\nModel Performance Summary:")
print(f"Average Error: ${metrics['MAE']:.2f}")
print(f"Direction Accuracy: {metrics['Direction_Accuracy']:.1f}%")
print(f"Model Fit (R²): {metrics['R2_Score']:.4f}")

if metrics['R2_Score'] > 0.8:
    print("✅ Model shows strong predictive power")
elif metrics['R2_Score'] > 0.6:
    print("⚠️ Model shows moderate predictive power")
else:
    print("❌ Model needs improvement")

Metrics Interpretation

Error Metrics

  • MAE (Mean Absolute Error): Average absolute difference between predictions and actual values. Lower is better.
  • RMSE (Root Mean Squared Error): Penalizes larger errors more heavily. Lower is better.
  • MAPE (Mean Absolute Percentage Error): Error as a percentage. Values under 10% indicate good performance.

Model Quality

  • R² Score: Ranges from -∞ to 1. Values closer to 1 indicate better fit.
    • R² > 0.9: Excellent
    • R² > 0.7: Good
    • R² > 0.5: Moderate
    • R² < 0.5: Poor

Directional Accuracy

  • Direction Accuracy: Percentage of times the model correctly predicted price direction (up/down).
    • 60%: Better than random
    • 70%: Good for trading signals
    • 80%: Excellent

See Also

Build docs developers (and LLMs) love