Skip to main content

Overview

The SGIVU ML service implements a robust model management system that handles versioning, persistence, metadata tracking, and model lifecycle operations. This ensures reproducibility, traceability, and seamless model updates.

Model Registry

The Model Registry is the central component for managing ML artifacts, implemented through the ModelRegistryPort interface with concrete implementations for file system and database storage.

Architecture

The service can operate in file-only mode (no database) or database-backed mode for enterprise deployments with centralized storage.

Model Versioning

Version Format

Models are versioned using timestamp-based identifiers:
YYYYMMDD_HHMMSS
Example: 20260306_143022 represents a model trained on March 6, 2026 at 14:30:22.

Why Timestamp Versioning?

Chronological Ordering

Versions are naturally sorted by training time

No Conflicts

Concurrent training jobs won’t collide (down to second precision)

Reproducibility

Easy to identify when a model was created

Simplicity

No need for separate version number management

Version Generation

Versions are automatically generated during model save:
from datetime import datetime, timezone

def generate_version() -> str:
    """Generate timestamp-based version identifier"""
    return datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")

# Example usage
version = generate_version()  # "20260306_143022"

Model Persistence

File-Based Storage

When using file system storage (configured via MODEL_DIR):
MODEL_DIR/
├── 20260301_093045/
│   ├── model.joblib      # Serialized sklearn pipeline
│   └── metadata.json     # Training metadata
├── 20260305_141530/
│   ├── model.joblib
│   └── metadata.json
└── 20260306_143022/      # Latest version
    ├── model.joblib
    └── metadata.json

Model Artifact (model.joblib)

The trained sklearn Pipeline object serialized with joblib:
import joblib

# Save
joblib.dump(pipeline, "model.joblib")

# Load
pipeline = joblib.load("model.joblib")
The pipeline includes:
  • Preprocessor: ColumnTransformer with encoders and scalers
  • Model: Best estimator (LinearRegression, RandomForest, or XGBoost)

Metadata File (metadata.json)

{
  "version": "20260306_143022",
  "trained_at": "2026-03-06T14:30:22.123456+00:00",
  "target": "sales_count",
  "features": [
    "vehicle_type",
    "brand",
    "model",
    "line",
    "purchases_count",
    "avg_margin",
    "avg_sale_price",
    "avg_purchase_price",
    "avg_days_inventory",
    "inventory_rotation",
    "lag_1",
    "lag_3",
    "lag_6",
    "rolling_mean_3",
    "rolling_mean_6",
    "month",
    "year",
    "month_sin",
    "month_cos"
  ],
  "metrics": {
    "rmse": 3.24,
    "mae": 2.15,
    "mape": 0.087,
    "r2": 0.89,
    "residual_std": 2.8
  },
  "candidates": [
    {
      "model": "linear_regression",
      "rmse": 4.12,
      "mae": 3.05,
      "mape": 0.124,
      "r2": 0.76,
      "samples": 462
    },
    {
      "model": "random_forest",
      "rmse": 3.45,
      "mae": 2.31,
      "mape": 0.095,
      "r2": 0.85,
      "samples": 462
    },
    {
      "model": "xgboost",
      "rmse": 3.24,
      "mae": 2.15,
      "mape": 0.087,
      "r2": 0.89,
      "samples": 462
    }
  ],
  "train_samples": 1847,
  "test_samples": 462,
  "total_samples": 2309
}
version
string
Unique model identifier (timestamp-based)
trained_at
string
ISO 8601 timestamp with timezone
target
string
Name of the target variable (usually sales_count)
features
array
List of feature names in the order expected by the model
metrics
object
Performance metrics from test set evaluation
  • rmse: Root Mean Squared Error
  • mae: Mean Absolute Error
  • mape: Mean Absolute Percentage Error
  • r2: R-squared score
  • residual_std: Used for prediction intervals
candidates
array
Comparison of all evaluated models with their metrics
train_samples
integer
Number of samples used for training
test_samples
integer
Number of samples used for evaluation
total_samples
integer
Total dataset size

Database Storage

For production deployments, models can be stored in PostgreSQL:

Schema

CREATE TABLE ml_model_artifacts (
    id SERIAL PRIMARY KEY,
    version VARCHAR(50) UNIQUE NOT NULL,
    trained_at TIMESTAMP WITH TIME ZONE NOT NULL,
    metadata JSONB NOT NULL,
    artifact BYTEA NOT NULL,  -- Serialized model pipeline
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX idx_ml_model_version ON ml_model_artifacts(version);
CREATE INDEX idx_ml_model_trained_at ON ml_model_artifacts(trained_at DESC);

Benefits of Database Storage

Centralized Storage

Single source of truth for all model versions

Easy Querying

SQL queries for model comparison and analysis

Automatic Backups

Models included in database backup strategy

Scalability

Handle large model collections without filesystem concerns

Model Lifecycle

Training and Registration

When a new model is trained: Code Flow (from app/application/services/training_service.py:88-94):
metadata_dict = {
    "trained_at": datetime.now(timezone.utc).isoformat(),
    "target": self._settings.target_column,
    "features": [
        *self._feature_engineering.category_cols,
        *self._feature_engineering.numeric_cols
    ],
    "metrics": {
        **evaluation.metrics,
        "residual_std": evaluation.residual_std,
    },
    "candidates": evaluation.candidates,
    "train_samples": evaluation.train_samples,
    "test_samples": evaluation.test_samples,
    "total_samples": len(dataset),
}

saved = await self._registry.save(evaluation.pipeline, metadata_dict)
logger.info("Model trained and versioned: %s", saved.version)

Loading for Prediction

When making predictions: Code (from app/application/services/prediction_service.py:177-181):
async def _load_model(self) -> tuple[Any, ModelMetadata]:
    try:
        return await self._registry.load_latest()
    except FileNotFoundError as exc:
        raise ModelNotTrainedError("Aún no existe un modelo entrenado.") from exc

Model Replacement

The service always uses the latest model by version. Older versions are retained for auditing but not used for predictions unless explicitly loaded.
When a new model is trained:
  1. Old model: Remains in storage with its version
  2. New model: Saved with newer version timestamp
  3. Predictions: Automatically switch to new model on next request
No downtime or service restart required. The next prediction request will load the new model.

Feature Snapshots

For reproducibility, the service can persist feature datasets alongside models.

Purpose

Feature snapshots enable:
  • Prediction without raw data: Use pre-computed features
  • Faster inference: No need to rebuild features from transactions
  • Reproducibility: Ensure predictions use exact training feature distributions
  • Debugging: Compare features across model versions

Database Schema

CREATE TABLE ml_training_features (
    id SERIAL PRIMARY KEY,
    model_version VARCHAR(50) NOT NULL,
    features JSONB NOT NULL,  -- Serialized feature DataFrame
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX idx_ml_features_version ON ml_training_features(model_version);

Usage

Features are automatically saved during training (if feature_repository is configured):
# From app/application/services/training_service.py:90-91

if self._feature_repository:
    await self._feature_repository.save_snapshot(saved.version, dataset)
And loaded during prediction if available:
# From app/application/services/prediction_service.py:210-216

if self._feature_repository:
    history = await self._feature_repository.load_segment_history(
        model_version, segment.model_dump()
    )
    if not history.empty:
        return history
Feature snapshots can consume significant database space for large datasets. Consider retention policies or compression for long-term storage.

Prediction Logging

The service can log all prediction requests and responses for:
  • Auditing: Track who requested what predictions
  • Monitoring: Detect usage patterns and anomalies
  • Model evaluation: Compare predictions to actual outcomes
  • Debugging: Investigate prediction issues

Database Schema

CREATE TABLE ml_predictions (
    id SERIAL PRIMARY KEY,
    model_version VARCHAR(50) NOT NULL,
    request_payload JSONB NOT NULL,
    response_payload JSONB NOT NULL,
    segment JSONB NOT NULL,  -- vehicle_type, brand, model, line
    horizon INTEGER NOT NULL,
    confidence FLOAT NOT NULL,
    with_history BOOLEAN NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX idx_ml_predictions_version ON ml_predictions(model_version);
CREATE INDEX idx_ml_predictions_segment ON ml_predictions USING gin(segment);
CREATE INDEX idx_ml_predictions_created_at ON ml_predictions(created_at DESC);

Logged Information

{
  "model_version": "20260306_143022",
  "request_payload": {
    "vehicle_type": "CAR",
    "brand": "TOYOTA",
    "model": "COROLLA",
    "line": "XEI 2.0",
    "horizon_months": 6,
    "confidence": 0.95
  },
  "response_payload": {
    "predictions": [
      {"month": "2026-04-01", "demand": 45.3, "lower_ci": 38.1, "upper_ci": 52.5},
      {"month": "2026-05-01", "demand": 47.8, "lower_ci": 40.2, "upper_ci": 55.4}
    ],
    "model_version": "20260306_143022",
    "metrics": {...}
  },
  "segment": {
    "vehicle_type": "CAR",
    "brand": "TOYOTA",
    "model": "COROLLA",
    "line": "XEI 2.0"
  },
  "horizon": 6,
  "confidence": 0.95,
  "with_history": false,
  "created_at": "2026-03-07T10:15:30.123456+00:00"
}

Querying Prediction Logs

-- Get last 100 predictions
SELECT 
    created_at,
    model_version,
    segment->>'brand' as brand,
    segment->>'model' as model,
    horizon,
    response_payload->'predictions'->0->>'demand' as first_month_demand
FROM ml_predictions
ORDER BY created_at DESC
LIMIT 100;

Model Comparison

Compare performance across model versions to track improvements:

Via API

curl https://api.sgivu.com/v1/ml/models/latest \
  -H "Authorization: Bearer YOUR_TOKEN"
Response includes candidates field showing all evaluated models:
{
  "version": "20260306_143022",
  "metrics": {
    "rmse": 3.24,
    "mae": 2.15,
    "r2": 0.89
  },
  "candidates": [
    {"model": "linear_regression", "rmse": 4.12, "r2": 0.76},
    {"model": "random_forest", "rmse": 3.45, "r2": 0.85},
    {"model": "xgboost", "rmse": 3.24, "r2": 0.89}
  ]
}

Via Database

-- Compare latest 5 model versions
SELECT 
    version,
    trained_at,
    metadata->>'metrics'->>'rmse' as rmse,
    metadata->>'metrics'->>'r2' as r2,
    metadata->>'total_samples' as samples
FROM ml_model_artifacts
ORDER BY trained_at DESC
LIMIT 5;

Visualization Example

import pandas as pd
import matplotlib.pyplot as plt

# Fetch model history
versions = [
    {"version": "20260301_093045", "rmse": 4.2, "r2": 0.82},
    {"version": "20260305_141530", "rmse": 3.8, "r2": 0.85},
    {"version": "20260306_143022", "rmse": 3.24, "r2": 0.89},
]

df = pd.DataFrame(versions)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# RMSE over time
ax1.plot(df["version"], df["rmse"], marker="o")
ax1.set_title("RMSE Trend")
ax1.set_ylabel("RMSE")
ax1.tick_params(axis="x", rotation=45)

# R² over time
ax2.plot(df["version"], df["r2"], marker="o", color="green")
ax2.set_title("R² Trend")
ax2.set_ylabel("R² Score")
ax2.tick_params(axis="x", rotation=45)

plt.tight_layout()
plt.show()

Model Rollback

If a new model performs poorly in production, you can rollback by:

Option 1: Filesystem Rollback

Rename directory to make an older version “latest”:
# Temporarily move problematic version
mv MODEL_DIR/20260306_143022 MODEL_DIR/20260306_143022.backup

# Predictions will now use 20260305_141530
This is a manual process. Test thoroughly and consider implementing a proper rollback mechanism for production.

Option 2: Explicit Version Loading

Modify the registry to load a specific version instead of latest:
# Custom implementation (not in current codebase)

class VersionedModelRegistry(ModelRegistryPort):
    def __init__(self, model_dir: Path, preferred_version: str | None = None):
        self.model_dir = model_dir
        self.preferred_version = preferred_version
    
    async def load_latest(self) -> tuple[Any, ModelMetadata]:
        if self.preferred_version:
            return await self.load_version(self.preferred_version)
        # Otherwise use actual latest
        return await super().load_latest()
Set via environment variable:
PREFERRED_MODEL_VERSION=20260305_141530

Option 3: Retrain with Better Data

The best solution is usually to retrain with corrected data:
curl -X POST https://api.sgivu.com/v1/ml/retrain \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{}'

Monitoring and Observability

Health Checks

Verify model availability:
curl https://api.sgivu.com/v1/ml/models/latest \
  -H "Authorization: Bearer YOUR_TOKEN"
Expected response:
  • 200 OK: Model is available
  • 500 Error with "No hay modelos disponibles": No trained model

Metrics to Track

  • RMSE: Track over time, alert if > threshold
  • : Should be > 0.70 for good models
  • MAPE: Percentage error, aim for < 15%
  • Training frequency: How often are models retrained?
  • Training duration: Is it increasing over time?
  • Model size: Disk/memory usage per version
  • Prediction latency: Response time for forecasts
  • Prediction accuracy: Compare forecasts to actuals
  • Coverage: % of segments with sufficient training data
  • Usage: Predictions per segment/day
  • Confidence: Are predictions consistently within CI bounds?

Alerting

Set up alerts for:
alerts:
  - name: NoModelAvailable
    condition: latest_model_age > 7 days
    action: Trigger retraining
  
  - name: ModelPerformanceDegraded
    condition: rmse > 5.0 OR r2 < 0.70
    action: Review data quality, retrain
  
  - name: PredictionErrors
    condition: error_rate > 5%
    action: Check for missing segments, data issues

Best Practices

Version Retention Policy

Keep: Last 10 versions or 90 days of modelsArchive: Older versions to cold storageDelete: Models older than 1 year (after compliance review)

Model Documentation

Store additional documentation with each version:
  • Training notebook/script
  • Data quality report
  • Feature importance analysis
  • Business context (e.g., “trained after holiday season”)

A/B Testing

For major model changes, run A/B tests:
  • Route 10% of traffic to new model
  • Compare predictions and user feedback
  • Gradually increase traffic if successful

Reproducibility

Ensure models can be recreated:
  • Pin dependency versions (requirements.txt)
  • Store feature engineering code version
  • Save random seeds in metadata
  • Document hyperparameters

API Reference

For API operations related to model management, see:

Get Latest Model

Retrieve current model metadata

Retrain Model

Trigger new model training

Troubleshooting

Error: ModelNotTrainedError: Aún no existe un modelo entrenado.Cause: No model versions exist in MODEL_DIR or database.Solution:
  1. Run initial training via /v1/ml/retrain
  2. Check MODEL_DIR path is correct
  3. Verify database connectivity if using DB storage
Error: ValueError: unsupported pickle protocol or module import errorsCause: Model was trained with different Python/library versions.Solution:
  • Ensure consistent environment (use Docker)
  • Pin dependency versions
  • Retrain model in current environment
Cause: Different training data, features, or model selection.Expected behavior: Models evolve as data changes.To investigate:
  • Compare candidates field in metadata
  • Check if different algorithm was selected
  • Review training data date ranges
  • Compare feature distributions
Cause: Large model files or network latency (DB storage).Solutions:
  • Cache loaded model in memory (current implementation loads on each prediction)
  • Use file storage instead of DB for faster access
  • Implement model preloading during service startup

Next Steps

Training Process

Learn how models are trained

Prediction API

Use models for forecasting

Deployment Guide

Deploy SGIVU to production

Monitoring Guide

Set up model monitoring

Build docs developers (and LLMs) love