Overview
The XGBoost model is optimized for short-term predictions (1-72 hours) of cryptocurrency prices. It uses gradient boosting with advanced feature engineering to capture patterns in time series financial data. Best for: Hourly and daily predictions, capturing short-term momentum and volatility Source:source/models/xgboost_model.py
When to Use XGBoost
Short-term Trading
Predictions from 1 hour to 3 days ahead
Technical Analysis
Leverages momentum, volatility, and technical indicators
High Accuracy
Best direction accuracy for near-term movements
Real-time Features
Uses recent price action and volume patterns
Model Parameters
TheXGBoostCryptoPredictor accepts the following initialization parameters:
| Parameter | Default | Range | Description |
|---|---|---|---|
n_estimators | 200 | 100-500 | Number of boosting trees |
learning_rate | 0.07 | 0.01-0.3 | Step size for gradient descent |
max_depth | 6 | 3-10 | Maximum tree depth |
subsample | 0.8 | 0.5-1.0 | Fraction of samples per tree |
colsample_bytree | 0.8 | 0.5-1.0 | Fraction of features per tree |
The default parameters are optimized for cryptocurrency volatility. Higher
n_estimators improves accuracy but increases training time.Feature Engineering
The model automatically creates 60+ engineered features from OHLCV data:1. Returns and Momentum
- Return periods: 1h, 4h, 24h, 7d
- Momentum indicators over 7 and 14 periods
- Price change velocity
2. Moving Averages
- Simple MA: 7, 14, 30, 50 periods
- Exponential MA: 12, 26, 50 periods
- Price-to-MA ratios for each period
3. Volatility Indicators
- Rolling volatility: 7, 14, 30 periods
- Bollinger Bands (20-period)
- Band position indicator
4. Volume Features
- Volume moving average (7 periods)
- Volume ratio and change
- Volume-price correlation
5. Technical Indicators
- RSI normalization and overbought/oversold signals
- MACD difference and crossover signals
- High/Low and Close/Open ratios
6. Temporal Features
- Hour of day (0-23)
- Day of week (0-6)
- Day of month (1-31)
- Month (1-12)
7. Lag Features
- Previous close prices: 1, 2, 3, 7, 14 periods back
All features are automatically scaled using MinMaxScaler before training.
Usage Example
Training Process
Data Requirements
- Minimum: 100 data points after feature creation
- Recommended: 1000+ hourly data points (6+ weeks)
- Format: DataFrame with
open,high,low,close,volumecolumns - Index: DatetimeIndex for temporal features
Training Steps
-
Feature Creation (line 50-128)
Generates 60+ engineered features from raw OHLCV data -
Data Splitting (line 130-171)
Time-series split (no shuffling to preserve temporal order) -
Scaling (line 168-169)
MinMaxScaler fitted on training data only -
Model Training (line 188-194)
XGBoost with early stopping on validation set
Performance Metrics
Thetrain() method returns:
Prediction Horizons
| Horizon | Typical MAPE | Direction Accuracy | Use Case |
|---|---|---|---|
| 1-6 hours | 1-3% | 60-70% | Intraday trading |
| 12-24 hours | 3-6% | 55-65% | Daily positioning |
| 48-72 hours | 6-10% | 50-60% | Short-term trends |
Multi-step Predictions
Thepredict_future() method uses an iterative approach:
- Creates features from current data
- Predicts next time step
- Appends prediction to dataset
- Repeats for
periodsiterations
Iterative predictions accumulate error over time, which is why accuracy decreases for longer horizons.
Feature Importance
Analyze which features drive predictions:Advanced: Backtesting
Use thebacktest_model() utility function:
Prediction Intervals
Add confidence bounds to predictions:Configuration Examples
Conservative (Stable predictions)
Aggressive (Higher variance, may overfit)
Fast Training (Development)
Performance Optimization
Training Speed
Training Speed
- Reduce
n_estimators(100-150 for faster training) - Lower
max_depth(4-5 instead of 6) - Model uses
n_jobs=-1to utilize all CPU cores
Memory Usage
Memory Usage
- Feature creation happens in-memory
- For large datasets (>100k rows), consider downsampling
- Each tree stores feature importance data
Prediction Speed
Prediction Speed
- Single predictions: <10ms
- Multi-step (24 periods): ~200-500ms due to iterative process
- Batch predictions are faster than individual calls
Limitations
- Horizon Limit: Accuracy degrades after 72 hours
- Data Requirements: Needs substantial history (1000+ points)
- Black Swan Events: Cannot predict unprecedented market shocks
- Regime Changes: May lag during major trend reversals
Next Steps
Prophet Model
Learn about long-term predictions (1 week - 1 month)
Hybrid Model
Combine XGBoost + Prophet for best results
Model Comparison
Compare all three models side-by-side
API Reference
Detailed API documentation