Skip to main content

Overview

The XGBoost model is optimized for short-term predictions (1-72 hours) of cryptocurrency prices. It uses gradient boosting with advanced feature engineering to capture patterns in time series financial data. Best for: Hourly and daily predictions, capturing short-term momentum and volatility Source: source/models/xgboost_model.py

When to Use XGBoost

Short-term Trading

Predictions from 1 hour to 3 days ahead

Technical Analysis

Leverages momentum, volatility, and technical indicators

High Accuracy

Best direction accuracy for near-term movements

Real-time Features

Uses recent price action and volume patterns

Model Parameters

The XGBoostCryptoPredictor accepts the following initialization parameters:
ParameterDefaultRangeDescription
n_estimators200100-500Number of boosting trees
learning_rate0.070.01-0.3Step size for gradient descent
max_depth63-10Maximum tree depth
subsample0.80.5-1.0Fraction of samples per tree
colsample_bytree0.80.5-1.0Fraction of features per tree
The default parameters are optimized for cryptocurrency volatility. Higher n_estimators improves accuracy but increases training time.

Feature Engineering

The model automatically creates 60+ engineered features from OHLCV data:

1. Returns and Momentum

  • Return periods: 1h, 4h, 24h, 7d
  • Momentum indicators over 7 and 14 periods
  • Price change velocity

2. Moving Averages

  • Simple MA: 7, 14, 30, 50 periods
  • Exponential MA: 12, 26, 50 periods
  • Price-to-MA ratios for each period

3. Volatility Indicators

  • Rolling volatility: 7, 14, 30 periods
  • Bollinger Bands (20-period)
  • Band position indicator

4. Volume Features

  • Volume moving average (7 periods)
  • Volume ratio and change
  • Volume-price correlation

5. Technical Indicators

  • RSI normalization and overbought/oversold signals
  • MACD difference and crossover signals
  • High/Low and Close/Open ratios

6. Temporal Features

  • Hour of day (0-23)
  • Day of week (0-6)
  • Day of month (1-31)
  • Month (1-12)

7. Lag Features

  • Previous close prices: 1, 2, 3, 7, 14 periods back
All features are automatically scaled using MinMaxScaler before training.

Usage Example

import pandas as pd
from models.xgboost_model import XGBoostCryptoPredictor

# Initialize with custom parameters
predictor = XGBoostCryptoPredictor(
    n_estimators=250,
    learning_rate=0.05,
    max_depth=7
)

# Load your historical data (must have OHLCV columns)
df = pd.read_csv('btc_hourly.csv', index_col='timestamp', parse_dates=True)

# Train the model (80% train, 20% test)
metrics = predictor.train(df, train_size=0.8)
print(f"Test MAPE: {metrics['test_mape']:.2f}%")
print(f"Direction Accuracy: {metrics['test_direction_accuracy']:.2f}%")

# Predict next 24 hours
predictions = predictor.predict_future(df, periods=24)

Training Process

Data Requirements

  • Minimum: 100 data points after feature creation
  • Recommended: 1000+ hourly data points (6+ weeks)
  • Format: DataFrame with open, high, low, close, volume columns
  • Index: DatetimeIndex for temporal features

Training Steps

  1. Feature Creation (line 50-128)
    Generates 60+ engineered features from raw OHLCV data
  2. Data Splitting (line 130-171)
    Time-series split (no shuffling to preserve temporal order)
  3. Scaling (line 168-169)
    MinMaxScaler fitted on training data only
  4. Model Training (line 188-194)
    XGBoost with early stopping on validation set

Performance Metrics

The train() method returns:
{
    'train_mae': float,           # Mean Absolute Error (training)
    'test_mae': float,            # Mean Absolute Error (test)
    'train_rmse': float,          # Root Mean Square Error (training)
    'test_rmse': float,           # Root Mean Square Error (test)
    'train_mape': float,          # Mean Absolute Percentage Error (training)
    'test_mape': float,           # MAPE on test set
    'train_direction_accuracy': float,  # % correct price direction (up/down)
    'test_direction_accuracy': float    # Direction accuracy on test
}
Direction accuracy is often more important than exact price prediction for trading strategies.

Prediction Horizons

HorizonTypical MAPEDirection AccuracyUse Case
1-6 hours1-3%60-70%Intraday trading
12-24 hours3-6%55-65%Daily positioning
48-72 hours6-10%50-60%Short-term trends
Beyond 72 hours, prediction accuracy degrades significantly. Use the Prophet model for longer horizons.

Multi-step Predictions

The predict_future() method uses an iterative approach:
  1. Creates features from current data
  2. Predicts next time step
  3. Appends prediction to dataset
  4. Repeats for periods iterations
# Predict next 48 hours
predictions = predictor.predict_future(df, periods=48)

# Result DataFrame:
# timestamp | predicted_price
# 2026-03-08 01:00:00 | 48532.12
# 2026-03-08 02:00:00 | 48621.45
# ...
Iterative predictions accumulate error over time, which is why accuracy decreases for longer horizons.

Feature Importance

Analyze which features drive predictions:
# After training
importance_df = predictor.get_feature_importance()
print(importance_df.head(10))

# Output:
#           feature  importance
# 0      close_lag_1      0.142
# 1      return_24h      0.098
# 2   volatility_14      0.085
# 3    bb_position       0.071
# ...

Advanced: Backtesting

Use the backtest_model() utility function:
from models.xgboost_model import backtest_model

predictor = XGBoostCryptoPredictor()
results = backtest_model(df, predictor, train_size=0.8)

print(results['metrics'])
print(results['feature_importance'])

# Access predictions for visualization
train_actual = results['train_actual']
train_predicted = results['train_predicted']
test_actual = results['test_actual']
test_predicted = results['test_predicted']

Prediction Intervals

Add confidence bounds to predictions:
from models.xgboost_model import create_prediction_intervals

predictions = predictor.predict_future(df, periods=24)
predictions_with_intervals = create_prediction_intervals(
    predictions, 
    confidence=0.95  # 95% confidence interval
)

# Adds columns:
# - lower_bound
# - upper_bound

Configuration Examples

Conservative (Stable predictions)

predictor = XGBoostCryptoPredictor(
    n_estimators=150,
    learning_rate=0.05,
    max_depth=4,
    subsample=0.7,
    colsample_bytree=0.7
)

Aggressive (Higher variance, may overfit)

predictor = XGBoostCryptoPredictor(
    n_estimators=300,
    learning_rate=0.1,
    max_depth=8,
    subsample=0.9,
    colsample_bytree=0.9
)

Fast Training (Development)

predictor = XGBoostCryptoPredictor(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5
)

Performance Optimization

  • Reduce n_estimators (100-150 for faster training)
  • Lower max_depth (4-5 instead of 6)
  • Model uses n_jobs=-1 to utilize all CPU cores
  • Feature creation happens in-memory
  • For large datasets (>100k rows), consider downsampling
  • Each tree stores feature importance data
  • Single predictions: <10ms
  • Multi-step (24 periods): ~200-500ms due to iterative process
  • Batch predictions are faster than individual calls

Limitations

  1. Horizon Limit: Accuracy degrades after 72 hours
  2. Data Requirements: Needs substantial history (1000+ points)
  3. Black Swan Events: Cannot predict unprecedented market shocks
  4. Regime Changes: May lag during major trend reversals

Next Steps

Prophet Model

Learn about long-term predictions (1 week - 1 month)

Hybrid Model

Combine XGBoost + Prophet for best results

Model Comparison

Compare all three models side-by-side

API Reference

Detailed API documentation

Build docs developers (and LLMs) love