Skip to main content

Overview

The Hybrid model intelligently combines XGBoost (short-term) and Prophet (long-term) to provide the best prediction for any time horizon. It automatically selects or blends models based on the forecast period. Best for: All-purpose predictions from 1 hour to 1 month Source: source/models/hybrid_model.py

How It Works

1

Train Both Models

Trains XGBoost and Prophet on your historical data in parallel
2

Evaluate Horizon

Determines prediction timeframe (short vs long term)
3

Select or Blend

  • ≤72 hours: XGBoost only
  • 24-72 hours: Both models with weighted ensemble
  • >72 hours: Prophet only or ensemble
4

Return Best Prediction

Provides the optimal prediction with confidence intervals

Model Selection Logic

The hybrid model uses different strategies based on prediction horizon:
if periods <= 24:
    # Short-term: XGBoost only
    use_model = "xgboost"
    
elif 24 < periods <= 72:
    # Medium-term: Ensemble both models
    xgb_weight = 1 - (periods / 168)
    prophet_weight = 1 - xgb_weight
    use_model = "hybrid"
    
else:  # periods > 72
    # Long-term: Prophet dominant
    use_model = "hybrid" or "prophet"
The weighting formula gradually transitions from XGBoost to Prophet as the horizon increases.

Usage Example

import pandas as pd
from models.hybrid_model import HybridCryptoPredictor

# Initialize hybrid model
predictor = HybridCryptoPredictor()

# Load historical data
df = pd.read_csv('btc_hourly.csv', index_col='timestamp', parse_dates=True)

# Train both models at once
training_info = predictor.train(df)

print("XGBoost metrics:", training_info['xgboost'])
print("Prophet metrics:", training_info['prophet'])
print(f"Trained on {training_info['data_points']} data points")

# Predict next 48 hours (will use ensemble)
predictions = predictor.predict_future(df, periods=48)

print(f"Recommended model: {predictions['recommended']}")
print(f"Weights: {predictions.get('weights', 'N/A')}")

# Get best prediction
best = predictor.get_best_prediction(predictions)
print(best[['predicted_price', 'lower_bound', 'upper_bound']])

Prediction Output

The predict_future() method returns a dictionary with multiple predictions:
{
    'xgboost': pd.DataFrame,      # XGBoost predictions (if applicable)
    'prophet': pd.DataFrame,      # Prophet predictions (if applicable)
    'hybrid': pd.DataFrame,       # Ensemble predictions (if applicable)
    'recommended': str,           # Which model to use: 'xgboost', 'prophet', or 'hybrid'
    'weights': {                  # Ensemble weights (if hybrid)
        'xgboost': float,
        'prophet': float
    }
}

DataFrame Structure

Each prediction DataFrame contains:
ColumnSourceDescription
predicted_priceBothPoint estimate
lower_boundBothLower confidence bound (95%)
upper_boundBothUpper confidence bound (95%)
trendProphet onlyTrend component

Ensemble Weighting Formula

For periods between 24-72 hours, the hybrid model blends predictions:
# Calculate weights based on horizon
xgb_weight = max(0, 1 - (periods / 168))
prophet_weight = 1 - xgb_weight

# Weighted average of predictions
hybrid_price = (xgb_pred * xgb_weight) + (prophet_pred * prophet_weight)

Example Weights

PeriodsHoursXGBoost WeightProphet WeightStrategy
1212h93%7%Mostly XGBoost
241d86%14%Mostly XGBoost
482d71%29%XGBoost dominant
723d57%43%Balanced
964d43%57%Prophet dominant
1687d0%100%Pure Prophet
The weighting smoothly transitions from XGBoost to Prophet, avoiding sudden jumps in predictions.

Training Process

Sequential Training

The hybrid model trains both sub-models:
def train(self, df: pd.DataFrame) -> Dict:
    # 1. Train XGBoost (80/20 split)
    xgb_metrics = self.xgboost.train(df, train_size=0.8)
    
    # 2. Train Prophet (all data)
    prophet_metrics = self.prophet.train(df)
    
    # 3. Return combined metrics
    return {
        'xgboost': xgb_metrics,
        'prophet': prophet_metrics,
        'data_points': len(df)
    }

Training Time

  • XGBoost: 5-15 seconds (1000 data points)
  • Prophet: 10-30 seconds (1000 data points)
  • Total: ~15-45 seconds for full hybrid training
Training happens once. After training, predictions are fast regardless of horizon.

Getting the Best Prediction

Use get_best_prediction() to automatically select the optimal forecast:
# Predict multiple horizons
short_term = predictor.predict_future(df, periods=12)
medium_term = predictor.predict_future(df, periods=48) 
long_term = predictor.predict_future(df, periods=168)

# Get best for each
best_short = predictor.get_best_prediction(short_term)   # Uses XGBoost
best_medium = predictor.get_best_prediction(medium_term) # Uses Hybrid
best_long = predictor.get_best_prediction(long_term)     # Uses Prophet
This method implements the selection logic:
def get_best_prediction(self, predictions: Dict) -> pd.DataFrame:
    if 'hybrid' in predictions:
        return predictions['hybrid']
    elif predictions.get('recommended') == 'xgboost':
        return predictions['xgboost']
    else:
        return predictions['prophet']

Confidence Intervals

The hybrid model preserves confidence intervals from both models:
# XGBoost intervals (statistical estimation)
from models.xgboost_model import create_prediction_intervals
xgb_with_intervals = create_prediction_intervals(xgb_predictions)

# Prophet intervals (native to Prophet)
prophet_predictions  # Already includes lower_bound and upper_bound

# Hybrid intervals (weighted average of both)
hybrid['lower_bound'] = (
    xgb['lower_bound'] * xgb_weight + 
    prophet['lower_bound'] * prophet_weight
)
hybrid['upper_bound'] = (
    xgb['upper_bound'] * xgb_weight + 
    prophet['upper_bound'] * prophet_weight
)
Confidence intervals widen as the prediction horizon increases, reflecting greater uncertainty.

Use Cases by Horizon

Recommended: XGBoost only
predictions = predictor.predict_future(df, periods=24)
# predictions['recommended'] == 'xgboost'
Why: XGBoost has superior accuracy for short-term predictions using recent price action and technical indicators.

Advanced: Accessing All Predictions

Inspect predictions from both models individually:
predictions = predictor.predict_future(df, periods=48)

# Access individual model predictions
if 'xgboost' in predictions:
    xgb_df = predictions['xgboost']
    print("XGBoost prediction:", xgb_df['predicted_price'].iloc[0])

if 'prophet' in predictions:
    prophet_df = predictions['prophet']
    print("Prophet prediction:", prophet_df['predicted_price'].iloc[0])
    print("Prophet trend:", prophet_df['trend'].iloc[0])

if 'hybrid' in predictions:
    hybrid_df = predictions['hybrid']
    print("Hybrid prediction:", hybrid_df['predicted_price'].iloc[0])

Comparison Example

import matplotlib.pyplot as plt

# Get predictions for 72 hours
predictions = predictor.predict_future(df, periods=72)

xgb = predictions['xgboost']['predicted_price']
prophet = predictions['prophet']['predicted_price']
hybrid = predictions['hybrid']['predicted_price']

# Plot all three
plt.figure(figsize=(12, 6))
plt.plot(xgb.index, xgb.values, label='XGBoost', alpha=0.7)
plt.plot(prophet.index, prophet.values, label='Prophet', alpha=0.7)
plt.plot(hybrid.index, hybrid.values, label='Hybrid', linewidth=2)
plt.legend()
plt.title('Model Comparison: 72-hour Forecast')
plt.show()

Performance Characteristics

MetricXGBoostProphetHybrid
Short-term (1-24h)⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Medium-term (1-3d)⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Long-term (1w+)⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Training SpeedFastSlowSlow
Prediction SpeedSlow (iterative)FastFast
Confidence IntervalsEstimatedNativeBoth
InterpretabilityLowHighMedium

Training Information

Retrieve detailed training metrics:
training_info = predictor.train(df)

# XGBoost metrics
print("\nXGBoost Performance:")
print(f"  Test MAPE: {training_info['xgboost']['test_mape']:.2f}%")
print(f"  Direction Accuracy: {training_info['xgboost']['test_direction_accuracy']:.2f}%")

# Prophet metrics  
print("\nProphet Performance:")
print(f"  MAPE: {training_info['prophet']['mape']:.2f}%")
print(f"  Direction Accuracy: {training_info['prophet']['direction_accuracy']:.2f}%")

# Data info
print(f"\nTrained on {training_info['data_points']} data points")

# Get detailed info
detailed = predictor.get_training_info()

Best Practices

if 'weights' in predictions:
    w = predictions['weights']
    print(f"XGBoost: {w['xgboost']:.0%}, Prophet: {w['prophet']:.0%}")
best = predictor.get_best_prediction(predictions)

# Check if prediction within reasonable bounds
current_price = df['close'].iloc[-1]
upper = best['upper_bound'].iloc[0]
lower = best['lower_bound'].iloc[0]

if lower <= current_price <= upper:
    print("Prediction appears reasonable")
# Retrain with latest data
new_df = fetch_latest_data()  # Your data fetching function
predictor.train(new_df)

# Recommended: Retrain daily for production systems

Advantages of Hybrid Approach

  1. Optimal for All Horizons: No need to manually choose models
  2. Smooth Transitions: Weighted ensemble avoids prediction jumps
  3. Best of Both Worlds: Combines XGBoost’s accuracy with Prophet’s trends
  4. Confidence Intervals: Provides uncertainty estimates from both models
  5. Automatic Selection: Intelligent model routing based on horizon

Limitations

  1. Training Time: Must train both models (2x training time)
  2. Memory Usage: Stores two models in memory
  3. Complexity: More complex than single-model approach
  4. Ensemble Zone: May not always improve over best individual model

When to Use Hybrid vs Individual Models

Use Hybrid

  • Variable prediction horizons
  • Want automatic model selection
  • Need confidence intervals
  • Production systems
  • General-purpose forecasting

Use Individual Models

  • Fixed short-term horizon (use XGBoost)
  • Fixed long-term horizon (use Prophet)
  • Memory constraints
  • Faster training needed
  • Research and experimentation

Quick Start Template

from models.hybrid_model import HybridCryptoPredictor
import pandas as pd

# 1. Initialize
predictor = HybridCryptoPredictor()

# 2. Load data
df = pd.read_csv('data.csv', index_col='timestamp', parse_dates=True)

# 3. Train
print("Training models...")
metrics = predictor.train(df)
print("✓ Training complete")

# 4. Predict
periods = 48  # 2 days
predictions = predictor.predict_future(df, periods=periods)

# 5. Get best prediction
best = predictor.get_best_prediction(predictions)

# 6. Display results
print(f"\nForecast for next {periods} hours:")
print(f"Model used: {predictions['recommended']}")
print(f"Next price: ${best['predicted_price'].iloc[0]:,.2f}")
print(f"Range: ${best['lower_bound'].iloc[0]:,.2f} - ${best['upper_bound'].iloc[0]:,.2f}")

Next Steps

XGBoost Model

Deep dive into short-term predictions

Prophet Model

Deep dive into long-term forecasting

Model Comparison

Detailed comparison of all three models

API Reference

Complete API documentation

Build docs developers (and LLMs) love