Machine Learning Models - CryptoView Pro

Overview

CryptoView Pro employs three specialized machine learning models, each optimized for different forecasting horizons. The system intelligently selects the best model based on the prediction timeframe, or combines them using a hybrid approach.

XGBoost

1-72 hoursFast, accurate short-term predictions using gradient boosting

Prophet

1 week - 1 yearTrend and seasonality detection for long-term forecasts

Hybrid

AdaptiveWeighted ensemble combining both models

Model Selection Logic

The system recommends models based on forecast horizon:

# Recommendation logic
if forecast_hours <= 72:
    recommended_model = "XGBoost"  # Short-term precision
elif forecast_hours <= 720:  # 30 days
    recommended_model = "Hybrid"   # Balanced approach
else:
    recommended_model = "Prophet"  # Long-term trends

Users can override the recommendation and manually select any model through the sidebar.

XGBoost Model

Location: models/xgboost_model.py

Architecture

XGBoost (eXtreme Gradient Boosting) uses an ensemble of decision trees trained sequentially, with each tree correcting errors from previous trees.

Best for: Intraday and swing trading (1-72 hours)Strengths: Fast training, handles non-linear patterns, robust to outliersLimitations: Performance degrades beyond 3 days

Model Configuration

models/xgboost_model.py

class XGBoostCryptoPredictor:
    def __init__(self, 
                 n_estimators: int = 200,      # Number of trees
                 learning_rate: float = 0.07,   # Step size shrinkage
                 max_depth: int = 6,            # Tree depth
                 subsample: float = 0.8,        # Row sampling
                 colsample_bytree: float = 0.8): # Column sampling
        
        self.model = xgb.XGBRegressor(
            n_estimators=n_estimators,
            learning_rate=learning_rate,
            max_depth=max_depth,
            subsample=subsample,
            colsample_bytree=colsample_bytree,
            objective='reg:squarederror',  # MSE loss
            random_state=42,
            n_jobs=-1  # Use all CPU cores
        )

Feature Engineering

XGBoost’s strength lies in its 60+ engineered features:

Returns (4 features)

Multi-timeframe percentage changes:

df['return_1'] = df['close'].pct_change(1)      # 1-hour return
df['return_4'] = df['close'].pct_change(4)      # 4-hour return
df['return_24'] = df['close'].pct_change(24)    # 24-hour return
df['return_168'] = df['close'].pct_change(168)  # 7-day return

Moving Averages (8 features)

Simple moving averages and price ratios:

for window in [7, 14, 30, 50]:
    df[f'ma_{window}'] = df['close'].rolling(window=window).mean()
    df[f'price_to_ma_{window}'] = df['close'] / df[f'ma_{window}']

Exponential Moving Averages (3 features)

Weighted averages giving more importance to recent prices:

for span in [12, 26, 50]:
    df[f'ema_{span}'] = df['close'].ewm(span=span, adjust=False).mean()

Volatility (3 features)

Rolling standard deviation of returns:

for window in [7, 14, 30]:
    df[f'volatility_{window}'] = df['return_1'].rolling(window=window).std()

Momentum (2 features)

Price change over fixed periods:

df['momentum_7'] = df['close'] - df['close'].shift(7)
df['momentum_14'] = df['close'] - df['close'].shift(14)

Bollinger Bands (4 features)

Volatility bands and price position:

df['bb_middle'] = df['close'].rolling(window=20).mean()
bb_std = df['close'].rolling(window=20).std()
df['bb_upper'] = df['bb_middle'] + (bb_std * 2)
df['bb_lower'] = df['bb_middle'] - (bb_std * 2)
df['bb_position'] = (df['close'] - df['bb_lower']) / (df['bb_upper'] - df['bb_lower'])

Volume Features (3 features)

Trading volume patterns:

df['volume_ma_7'] = df['volume'].rolling(window=7).mean()
df['volume_ratio'] = df['volume'] / df['volume_ma_7']
df['volume_change'] = df['volume'].pct_change(1)

OHLC Ratios (2 features)

Intrabar price relationships:

df['high_low_ratio'] = df['high'] / df['low']
df['close_open_ratio'] = df['close'] / df['open']

Temporal Features (4 features)

Time-based cyclical patterns:

df['hour'] = df.index.hour              # Hour of day (0-23)
df['day_of_week'] = df.index.dayofweek  # Day of week (0-6)
df['day_of_month'] = df.index.day       # Day of month (1-31)
df['month'] = df.index.month            # Month (1-12)

RSI Features (3 features)

Momentum oscillator features:

df['rsi_normalized'] = df['rsi'] / 100
df['rsi_oversold'] = (df['rsi'] < 30).astype(int)
df['rsi_overbought'] = (df['rsi'] > 70).astype(int)

MACD Features (2 features)

Trend following features:

df['macd_diff'] = df['macd'] - df['macd_signal']
df['macd_positive'] = (df['macd_diff'] > 0).astype(int)

Lag Features (5 features)

Past price values:

for lag in [1, 2, 3, 7, 14]:
    df[f'close_lag_{lag}'] = df['close'].shift(lag)

Total: 60+ features capturing price dynamics, trends, volatility, and market microstructure.

Training Process

Data Preparation

# Time series split (respects temporal order)
X_train, X_test, y_train, y_test = prepare_data(df, train_size=0.8)

# Feature scaling with MinMaxScaler
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Model Training

model.fit(
    X_train, y_train,
    eval_set=[(X_test, y_test)],  # Validation during training
    verbose=False
)

Evaluation Metrics

Multiple metrics assess model quality:

MAE (Mean Absolute Error): Average prediction error in dollars
RMSE (Root Mean Squared Error): Penalizes large errors
MAPE (Mean Absolute Percentage Error): Error as percentage
Direction Accuracy: % of correct up/down predictions

Prediction Generation

Recursive multi-step forecasting:

def predict_future(df, periods=24):
    predictions = []
    df_work = df.copy()
    
    for i in range(periods):
        # 1. Create features from current data
        df_features = create_features(df_work)
        
        # 2. Get last row and scale
        last_row = df_features[feature_columns].iloc[-1:]
        last_row_scaled = scaler.transform(last_row)
        
        # 3. Predict next price
        pred = model.predict(last_row_scaled)[0]
        predictions.append(pred)
        
        # 4. Add prediction to data for next iteration
        new_row = create_synthetic_row(pred)
        df_work = pd.concat([df_work, new_row])
    
    return predictions

Recursive forecasting can accumulate errors over time. This is why XGBoost is best for horizons ≤72 hours.

Confidence Intervals

Prediction uncertainty is estimated using statistical methods:

def create_prediction_intervals(predictions, confidence=0.95):
    std_estimate = predictions['predicted_price'].std()
    z_score = stats.norm.ppf((1 + confidence) / 2)  # 1.96 for 95%
    margin = z_score * std_estimate
    
    predictions['lower_bound'] = predictions['predicted_price'] - margin
    predictions['upper_bound'] = predictions['predicted_price'] + margin
    
    return predictions

Feature Importance

XGBoost tracks which features are most predictive:

importance_df = predictor.get_feature_importance()
# Returns: DataFrame sorted by importance
# Top features typically: close_lag_1, ema_12, volume_ratio, rsi

Prophet Model

Location: models/prophet_model.py

Architecture

Prophet (developed by Meta/Facebook) decomposes time series into: y(t) = trend(t) + seasonality(t) + holidays(t) + error(t)

Best for: Medium to long-term forecasting (1 week - 1 year)Strengths: Handles trends, seasonality, missing data, outliersLimitations: Less accurate for short-term fluctuations

Model Configuration

models/prophet_model.py

class ProphetCryptoPredictor:
    def __init__(self, 
                 changepoint_prior_scale: float = 0.5,
                 seasonality_prior_scale: float = 10,
                 interval_width: float = 0.95):
        
        self.model = Prophet(
            changepoint_prior_scale=0.5,      # Trend flexibility
            seasonality_prior_scale=10,       # Seasonality strength
            interval_width=0.95,              # Confidence intervals
            daily_seasonality=True,           # 24-hour patterns
            weekly_seasonality=True,          # 7-day patterns
            yearly_seasonality=False,         # Not relevant for crypto
            seasonality_mode='multiplicative' # Scales with trend
        )

Key Parameters Explained

changepoint_prior_scale

float

default:"0.5"

Controls trend flexibility. Higher values (0.5) allow more dramatic trend changes, suitable for volatile crypto markets.

Low (0.001-0.05): Smooth, stable trends
Medium (0.05-0.5): Moderate flexibility
High (0.5-1.0): Highly flexible, follows volatility

seasonality_prior_scale

float

default:"10"

Strength of seasonal components. Higher values (10) detect stronger daily/weekly patterns.

Crypto markets have weak seasonality compared to traditional markets
Value of 10 balances pattern detection with noise filtering

seasonality_mode

string

default:"multiplicative"

How seasonality scales with the trend:

Additive: Seasonality has constant amplitude
Multiplicative: Seasonality scales proportionally with price (better for crypto)

Data Preparation

Prophet requires specific DataFrame format:

def prepare_data(df):
    prophet_df = pd.DataFrame({
        'ds': df.index,        # Datetime column (required name)
        'y': df['close']       # Target variable (required name)
    })
    return prophet_df.dropna()

Training Process

Data Validation

Prophet requires minimum 100 data points and no missing values in the target.

Model Fitting

prophet_df = prepare_data(df)
model.fit(prophet_df)  # Automatically detects changepoints and seasonality

Prophet’s Stan-based backend performs MAP (Maximum A Posteriori) estimation.

In-Sample Evaluation

forecast = model.predict(prophet_df)

# Metrics calculated on training data
mae = np.mean(np.abs(actual - forecast['yhat']))
direction_accuracy = accuracy of up/down predictions

Prediction Generation

Prophet makes predictions all at once (not recursively):

def predict_future(periods, freq='H'):
    # 1. Create future datetime index
    future = model.make_future_dataframe(periods=periods, freq=freq)
    
    # 2. Generate forecast
    forecast = model.predict(future)
    
    # 3. Extract future predictions only
    future_forecast = forecast[forecast['ds'] > last_train_date]
    
    return pd.DataFrame({
        'timestamp': future_forecast['ds'],
        'predicted_price': future_forecast['yhat'],
        'lower_bound': future_forecast['yhat_lower'],  # Built-in CI
        'upper_bound': future_forecast['yhat_upper'],
        'trend': future_forecast['trend']              # Isolated trend
    })

Forecast Components

Prophet provides interpretable decomposition:

yhat: Final prediction (trend + seasonality)
trend: Long-term direction
daily: Daily seasonal component
weekly: Weekly seasonal component
yhat_lower/upper: Uncertainty intervals

Backtesting

Time series cross-validation:

def backtest_prophet(df, predictor, test_periods=168):
    # 1. Split data
    train_df = df.iloc[:-test_periods]
    test_df = df.iloc[-test_periods:]
    
    # 2. Train on historical data
    predictor.train(train_df)
    
    # 3. Predict test period
    predictions = predictor.predict_future(periods=test_periods)
    
    # 4. Compare with actual
    actual = test_df['close'].values
    predicted = predictions['predicted_price'].values
    
    # 5. Calculate metrics
    return {
        'mae': mean_absolute_error,
        'direction_accuracy': % correct direction
    }

Hybrid Model

Location: models/hybrid_model.py

Concept

The Hybrid model intelligently combines XGBoost and Prophet predictions using dynamic weighting based on forecast horizon.

Best for: All time horizonsStrengths: Adaptive, leverages both models’ strengthsApproach: Ensemble learning with time-based weights

Architecture

models/hybrid_model.py

class HybridCryptoPredictor:
    def __init__(self):
        self.xgboost = XGBoostCryptoPredictor()
        self.prophet = ProphetCryptoPredictor()
        self.trained = False

Training Process

Both models train independently:

def train(df):
    # Train XGBoost
    xgb_metrics = self.xgboost.train(df, train_size=0.8)
    
    # Train Prophet
    prophet_metrics = self.prophet.train(df)
    
    return {
        'xgboost': xgb_metrics,
        'prophet': prophet_metrics,
        'data_points': len(df)
    }

Dynamic Weighting Algorithm

Weight calculation based on forecast horizon:

def predict_future(df, periods):
    predictions = {}
    
    # XGBoost for short-term (≤72h)
    if periods <= 72:
        predictions['xgboost'] = xgboost.predict_future(df, periods)
        predictions['recommended'] = 'xgboost'
    
    # Prophet for medium/long-term (>24h)
    if periods > 24:
        predictions['prophet'] = prophet.predict_future(periods, freq='H')
        if periods > 72:
            predictions['recommended'] = 'prophet'
    
    # Create weighted ensemble if both available
    if 'xgboost' in predictions and 'prophet' in predictions:
        # Weight calculation
        xgb_weight = max(0, 1 - (periods / 168))
        prophet_weight = 1 - xgb_weight
        
        # Weighted average
        combined['predicted_price'] = (
            xgb_pred * xgb_weight + prophet_pred * prophet_weight
        )
        
        predictions['hybrid'] = combined
        predictions['weights'] = {'xgboost': xgb_weight, 'prophet': prophet_weight}
        predictions['recommended'] = 'hybrid'
    
    return predictions

Weighting Examples

24 Hours

XGBoost: 86%Prophet: 14%Short-term, favor XGBoost

7 Days (168h)

XGBoost: 0%Prophet: 100%Transition complete

72 Hours

XGBoost: 57%Prophet: 43%Balanced ensemble

Confidence Interval Blending

Uncertainty bounds are also weighted:

combined['lower_bound'] = (
    xgb['lower_bound'] * xgb_weight + 
    prophet['lower_bound'] * prophet_weight
)

combined['upper_bound'] = (
    xgb['upper_bound'] * xgb_weight + 
    prophet['upper_bound'] * prophet_weight
)

Advantages

Seamless Transition: Smoothly transitions from XGBoost to Prophet as horizon increases

Best of Both: Captures short-term patterns AND long-term trends

Reduced Error: Ensemble typically outperforms individual models

Robust: If one model fails, the other provides backup

Model Comparison

XGBoost

Training Time: Fast (1-2 seconds)Prediction Time: Fast (milliseconds)Memory Usage: ModerateAccuracy: High for short-termInterpretability: Medium (feature importance)

Prophet

Training Time: Moderate (5-10 seconds)Prediction Time: Fast (milliseconds)Memory Usage: LowAccuracy: High for long-termInterpretability: High (decomposable components)

Hybrid

Training Time: Moderate (sum of both)Prediction Time: Fast (minimal overhead)Memory Usage: High (both models loaded)Accuracy: Best overallInterpretability: Medium (blended)

Performance Metrics

All models track these metrics:

MAE

Mean Absolute ErrorAverage prediction error in dollarsLower is better

RMSE

Root Mean Squared ErrorEmphasizes large errorsLower is better

MAPE

Mean Absolute Percentage ErrorError as percentage of actual priceLower is better

Direction Accuracy

Directional Prediction Accuracy% of correct up/down predictionsHigher is better

Get Started

Core Concepts

ML Models

Features

Configuration

​Overview

XGBoost

Prophet

Hybrid

​Model Selection Logic

​XGBoost Model

​Architecture

​Model Configuration

​Feature Engineering

​Training Process

​Prediction Generation

​Confidence Intervals

​Feature Importance

​Prophet Model

​Architecture

​Model Configuration

​Key Parameters Explained

​Data Preparation

​Training Process

​Prediction Generation

​Forecast Components

​Backtesting

​Hybrid Model

​Concept

​Architecture

​Training Process

​Dynamic Weighting Algorithm

​Weighting Examples

24 Hours

7 Days (168h)

72 Hours

​Confidence Interval Blending

​Advantages

​Model Comparison

​Performance Metrics

MAE

RMSE

MAPE

Direction Accuracy

​Next Steps

Technical Indicators

Architecture

Build docs developers (and LLMs) love

Overview

Model Selection Logic

XGBoost Model

Architecture

Model Configuration

Feature Engineering

Training Process

Prediction Generation

Confidence Intervals

Feature Importance

Prophet Model

Architecture

Model Configuration

Key Parameters Explained

Data Preparation

Training Process

Prediction Generation

Forecast Components

Backtesting

Hybrid Model

Concept

Architecture

Training Process

Dynamic Weighting Algorithm

Weighting Examples

Confidence Interval Blending

Advantages

Model Comparison

Performance Metrics

Next Steps