XGBoostCryptoPredictor

Overview

The XGBoostCryptoPredictor class implements a gradient boosting model optimized for cryptocurrency price prediction. It automatically creates 50+ engineered features including returns, moving averages, volatility metrics, momentum indicators, and temporal features. Best for: Short to medium-term predictions (1-168 hours)

Constructor

from models.xgboost_model import XGBoostCryptoPredictor

predictor = XGBoostCryptoPredictor(
    n_estimators=200,
    learning_rate=0.07,
    max_depth=6,
    subsample=0.8,
    colsample_bytree=0.8
)

Parameters

n_estimators

int

default:"200"

Number of boosting trees to train. Higher values increase model complexity and training time.Effect: More trees = better fit but risk of overfittingRecommended range: 100-500

learning_rate

float

default:"0.07"

Step size shrinkage to prevent overfitting. Controls how much each tree contributes.Effect: Lower values require more trees but often produce better generalizationRecommended range: 0.01-0.3

max_depth

int

default:"6"

Maximum depth of each tree. Deeper trees can model more complex patterns.Effect: Higher depth = more complex interactions but higher risk of overfittingRecommended range: 3-10

subsample

float

default:"0.8"

Fraction of training samples used for each tree. Helps prevent overfitting.Effect: Lower values add randomness and reduce overfittingRecommended range: 0.6-1.0

colsample_bytree

float

default:"0.8"

Fraction of features to use when building each tree.Effect: Lower values increase diversity between treesRecommended range: 0.5-1.0

Methods

create_features()

Creates 50+ engineered features from raw OHLCV data.

df_with_features = predictor.create_features(df)

pd.DataFrame

required

DataFrame with columns: open, high, low, close, volume (optional), and datetime index.

return

pd.DataFrame

DataFrame with original columns plus:

Returns: 1h, 4h, 24h, 7d percentage changes
Moving Averages: MA(7,14,30,50) and ratios
Exponential MA: EMA(12,26,50)
Volatility: Rolling standard deviation (7,14,30 periods)
Momentum: 7 and 14 period momentum
Bollinger Bands: Upper, lower, middle, position
Volume features: Ratios and moving averages (if volume provided)
OHLC ratios: high/low, close/open
Temporal: hour, day_of_week, day_of_month, month
Technical indicators: RSI, MACD features (if present in input)
Lags: Close price at t-1, t-2, t-3, t-7, t-14
Target: Next period close price

prepare_data()

Prepares data for training with feature engineering and train/test split.

X_train, X_test, y_train, y_test = predictor.prepare_data(
    df, 
    train_size=0.8
)

pd.DataFrame

required

DataFrame with OHLCV data and datetime index.

train_size

float

default:"0.8"

Fraction of data to use for training (0.0-1.0). Remaining data used for testing.Note: Uses temporal split, not random split, to preserve time series structure.

return

Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]

Returns tuple of (X_train, X_test, y_train, y_test) with:

Features scaled using MinMaxScaler
NaN values removed
Minimum 100 samples required after feature creation

train()

Trains the XGBoost model and returns performance metrics.

metrics = predictor.train(df, train_size=0.8)
print(f"Test MAPE: {metrics['test_mape']:.2f}%")
print(f"Direction Accuracy: {metrics['test_direction_accuracy']:.2f}%")

pd.DataFrame

required

DataFrame with historical OHLCV data.

train_size

float

default:"0.8"

Fraction of data for training.

return

Dict

Dictionary containing:

{
    'train_mae': float,          # Mean Absolute Error on training set
    'test_mae': float,           # Mean Absolute Error on test set
    'train_rmse': float,         # Root Mean Squared Error on training set
    'test_rmse': float,          # Root Mean Squared Error on test set
    'train_mape': float,         # Mean Absolute Percentage Error (%) - train
    'test_mape': float,          # Mean Absolute Percentage Error (%) - test
    'train_direction_accuracy': float,  # Directional accuracy (%) - train
    'test_direction_accuracy': float    # Directional accuracy (%) - test
}

Key Metrics:

test_mape: Lower is better (good: <5%, acceptable: <10%)
test_direction_accuracy: Higher is better (>50% = better than random)

predict_future()

Generates recursive multi-step forecasts.

# Predict next 24 hours
predictions = predictor.predict_future(df, periods=24)

print(predictions)
#                      predicted_price
# timestamp                           
# 2026-03-08 01:00:00      42500.32
# 2026-03-08 02:00:00      42520.15
# ...

pd.DataFrame

required

DataFrame with historical data used to generate initial features.

periods

int

default:"24"

Number of time periods to forecast into the future.Note: Prediction accuracy decreases with longer horizons due to error accumulation.

return

pd.DataFrame

DataFrame with columns:

Index: timestamp (datetime)
predicted_price: Predicted closing price

Note: Each prediction uses previous predictions as features (recursive forecasting).

get_feature_importance()

Returns feature importance scores to understand model decisions.

importance = predictor.get_feature_importance()
print(importance.head(10))  # Top 10 most important features

return

pd.DataFrame

DataFrame with columns:

feature: Feature name
importance: Importance score (higher = more important)

Sorted by importance in descending order.

Utility Functions

backtest_model()

Performs comprehensive backtesting with train/test split.

from models.xgboost_model import backtest_model, XGBoostCryptoPredictor

predictor = XGBoostCryptoPredictor(
    n_estimators=300,
    learning_rate=0.05
)

results = backtest_model(df, predictor, train_size=0.8)

print("Metrics:", results['metrics'])
print("Top Features:", results['feature_importance'].head())

pd.DataFrame

required

Historical OHLCV data.

predictor

XGBoostCryptoPredictor

required

Initialized predictor instance.

train_size

float

default:"0.8"

Fraction for training.

return

Dict

{
    'metrics': Dict,                    # All training metrics
    'train_actual': np.ndarray,         # Actual training values
    'train_predicted': np.ndarray,      # Predicted training values
    'test_actual': np.ndarray,          # Actual test values
    'test_predicted': np.ndarray,       # Predicted test values
    'feature_importance': pd.DataFrame  # Feature importance ranking
}

create_prediction_intervals()

Adds confidence intervals to predictions.

from models.xgboost_model import create_prediction_intervals

predictions = predictor.predict_future(df, periods=48)
predictions_with_intervals = create_prediction_intervals(
    predictions, 
    confidence=0.95
)

print(predictions_with_intervals)
#                      predicted_price  lower_bound  upper_bound
# timestamp                           
# 2026-03-08 01:00:00      42500.32    41200.15    43800.49

predictions

pd.DataFrame

required

DataFrame with predicted_price column.

confidence

float

default:"0.95"

Confidence level (0.0-1.0). Common values: 0.90, 0.95, 0.99.

return

pd.DataFrame

Original DataFrame with added columns:

lower_bound: Lower confidence interval
upper_bound: Upper confidence interval

Method: Uses standard deviation of predictions with z-score for specified confidence level.

Complete Example

import pandas as pd
from models.xgboost_model import XGBoostCryptoPredictor, create_prediction_intervals

# Load your data
df = pd.read_csv('btc_hourly.csv', index_col='timestamp', parse_dates=True)

# Initialize predictor with custom hyperparameters
predictor = XGBoostCryptoPredictor(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=7,
    subsample=0.85,
    colsample_bytree=0.85
)

# Train the model
print("Training model...")
metrics = predictor.train(df, train_size=0.85)

print(f"Test MAPE: {metrics['test_mape']:.2f}%")
print(f"Test Direction Accuracy: {metrics['test_direction_accuracy']:.2f}%")
print(f"Test RMSE: ${metrics['test_rmse']:.2f}")

# Get feature importance
importance = predictor.get_feature_importance()
print("\nTop 5 Features:")
print(importance.head())

# Make predictions for next 48 hours
predictions = predictor.predict_future(df, periods=48)
predictions = create_prediction_intervals(predictions, confidence=0.95)

print("\nPredictions:")
print(predictions.head())

# Save predictions
predictions.to_csv('xgboost_predictions.csv')

Key Characteristics

Strengths:

Excellent short-term accuracy (1-72 hours)
Captures complex non-linear patterns
Automatic feature importance ranking
Robust to outliers
Fast training and prediction

Limitations:

Accuracy degrades with longer horizons
Requires significant historical data (500+ points recommended)
Recursive forecasting accumulates errors
Less interpretable than linear models

Typical Performance:

MAPE: 2-5% for 24h predictions
Direction Accuracy: 55-65%
Best for: Hourly to 3-day forecasts

Models

Data

Utilities

Overview

Constructor

Parameters

Methods

create_features()

prepare_data()

train()

predict_future()

get_feature_importance()

Utility Functions

backtest_model()

create_prediction_intervals()

Complete Example

Key Characteristics

Build docs developers (and LLMs) love

Models

Data

Utilities

​Overview

​Constructor

​Parameters

​Methods

​create_features()

​prepare_data()

​train()

​predict_future()

​get_feature_importance()

​Utility Functions

​backtest_model()

​create_prediction_intervals()

​Complete Example

​Key Characteristics

Build docs developers (and LLMs) love

Overview

Constructor

Parameters

Methods

create_features()

prepare_data()

train()

predict_future()

get_feature_importance()

Utility Functions

backtest_model()

create_prediction_intervals()

Complete Example

Key Characteristics