Skip to main content

Data Monitoring

While system monitoring tracks infrastructure health, ML-specific monitoring focuses on model behavior, data quality, and prediction reliability. This includes drift detection, outlier identification, and performance degradation tracking.

Why ML Monitoring Matters

Machine learning models face unique challenges in production:

Distribution Shift

Input data distributions change over time, causing models to perform poorly on new data

Concept Drift

The relationship between inputs and outputs changes, invalidating learned patterns

Data Quality

Missing values, outliers, or schema changes can cause silent failures

Model Degradation

Performance slowly declines as the world changes, often going unnoticed
Unlike traditional software bugs, ML model failures are often gradual and subtle. Without proper monitoring, you won’t know your model is failing until users complain.

Types of Drift

Covariate Shift (Data Drift)

The distribution of input features changes: P(X) changes, but P(Y|X) remains the same. Example: A credit scoring model trained on pre-pandemic data sees different income distributions post-pandemic.
# Training data
X_train ~ N(μ=50000, σ=15000)  # Income distribution

# Production data after economic shift
X_prod ~ N(μ=45000, σ=20000)   # Different distribution!

Prior Probability Shift (Label Drift)

The distribution of the target variable changes: P(Y) changes, but P(X|Y) remains the same. Example: Fraud detection during a holiday shopping season sees more fraud attempts.

Concept Drift

The relationship between inputs and outputs changes: P(Y|X) changes. Example: User preferences change, making an old recommendation model obsolete.

Monitoring Tools

Evidently

Evidently is an open-source library for ML monitoring:
  • Generate interactive HTML reports
  • Calculate drift metrics
  • Profile data quality
  • Track model performance
  • No infrastructure required (can run as a Python script)

Seldon Core

Seldon Core is a model serving platform with built-in analytics:
  • Outlier detection using Alibi Detect
  • Drift detection in production
  • Explainability with Alibi Explain
  • Integration with Kubernetes and MLServer

Alibi Detect

Alibi Detect provides algorithms for:
  • Drift detection (KS test, MMD, Chi-squared)
  • Outlier detection (Isolation Forest, Mahalanobis distance)
  • Online and offline detection modes

WhyLogs

WhyLogs offers lightweight data logging:
  • Efficient statistical profiling
  • Minimal storage overhead
  • Streaming-friendly

Seldon Core Setup

Seldon Core v2 provides a complete platform for model serving with monitoring capabilities.

Architecture

Prerequisites

Seldon Core v2 requires:
  • Ansible and Python packages
  • Kubernetes cluster (kind recommended)
  • CLI tools (kubectl, seldon)

Installation

1

Install Ansible and Dependencies

pip install ansible openshift docker passlib
ansible-galaxy collection install \
  git+https://github.com/SeldonIO/ansible-k8s-collection.git
2

Clone Seldon Core Repository

git clone https://github.com/SeldonIO/seldon-core --branch=v2
cd seldon-core
3

Run Ansible Playbooks

# Create kind cluster
ansible-playbook playbooks/kind-cluster.yaml

# Setup ecosystem (cert-manager, etc.)
ansible-playbook playbooks/setup-ecosystem.yaml

# Install Seldon Core
ansible-playbook playbooks/setup-seldon.yaml
4

Install Seldon CLI

# Download CLI
wget https://github.com/SeldonIO/seldon-core/releases/download/v2.7.0-rc1/seldon-linux-amd64

# Make executable and move to PATH
chmod +x seldon-linux-amd64
sudo mv seldon-linux-amd64 /usr/local/bin/seldon

# Verify installation
seldon --help
5

Port Forward Services

# Inference endpoint
kubectl port-forward --address 0.0.0.0 \
  svc/seldon-mesh -n seldon-mesh 9000:80

# Scheduler (for loading models)
kubectl port-forward --address 0.0.0.0 \
  svc/seldon-scheduler -n seldon-mesh 9004:9004
The Ansible playbooks handle all the complexity of setting up Seldon Core, including namespaces, RBAC, and dependencies.

Basic Example: Iris Classification

Test the installation with a simple model:
# Load the model
seldon model load -f seldon-examples/model-iris.yaml \
  --scheduler-host 0.0.0.0:9004

# Wait for model to be ready
seldon model list

# Run inference
seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  --inference-host 0.0.0.0:9000
The model YAML:
# model-iris.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/scv2/samples/mlserver_1.3.5/iris-sklearn"
  requirements:
  - sklearn
  memory: 100Ki

Drift Detection Example

Seldon’s income classification example demonstrates drift and outlier detection.

Load Models and Detectors

# Load preprocessing model
seldon model load -f seldon-examples/pipeline/income-preprocess.yaml \
  --scheduler-host 0.0.0.0:9004

# Load classification model
seldon model load -f seldon-examples/pipeline/income.yaml \
  --scheduler-host 0.0.0.0:9004

# Load drift detector
seldon model load -f seldon-examples/pipeline/income-drift.yaml \
  --scheduler-host 0.0.0.0:9004

# Load outlier detector
seldon model load -f seldon-examples/pipeline/income-outlier.yaml \
  --scheduler-host 0.0.0.0:9004

Drift Detector Configuration

# income-drift.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: income-drift
spec:
  storageUri: "gs://seldon-models/scv2/examples/mlserver_1.3.5/income/drift-detector"
  requirements:
    - mlserver
    - alibi-detect
The drift detector:
  • Uses Kolmogorov-Smirnov (KS) test for continuous features
  • Chi-squared test for categorical features
  • Compares production data to reference distribution
  • Returns drift scores and p-values

Outlier Detector Configuration

# income-outlier.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: income-outlier
spec:
  storageUri: "gs://seldon-models/scv2/examples/mlserver_1.3.5/income/outlier-detector"
  requirements:
    - mlserver
    - alibi-detect

Create Pipeline

Combine models into a pipeline:
# income-pipeline.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: income-production
spec:
  steps:
    - name: income
    - name: income-preprocess
    - name: income-outlier
      inputs:
      - income-preprocess
    - name: income-drift
      batch:
        size: 20  # Check drift every 20 requests
  output:
    steps:
    - income
    - income-outlier.outputs.is_outlier
Load the pipeline:
seldon pipeline load -f seldon-examples/pipeline/income-pipeline.yaml \
  --scheduler-host 0.0.0.0:9004

# Verify
seldon pipeline list

Send Test Data

Use the test client to send normal and anomalous data:
# seldon-examples/pipeline/client.py
import numpy as np
import json
import requests

# Load test data
with open("./test.npy", "rb") as f:
    x_ref = np.load(f)      # Reference data
    x_h1 = np.load(f)       # Drifted data
    y_ref = np.load(f)      # Labels
    x_outlier = np.load(f)  # Outliers

def infer(resourceName: str, batchSz: int, requestType: str):
    """Send inference request."""
    # Select data based on type
    if requestType == "outlier":
        rows = x_outlier[0:batchSz]
    elif requestType == "drift":
        rows = x_h1[0:batchSz]
    else:
        rows = x_ref[0:batchSz]
    
    # Build request
    reqJson = {
        "inputs": [{
            "name": "input_1",
            "data": rows.flatten().tolist(),
            "datatype": "FP32",
            "shape": [batchSz, rows.shape[1]]
        }]
    }
    
    headers = {
        "Content-Type": "application/json",
        "seldon-model": resourceName
    }
    
    response = requests.post(
        "http://0.0.0.0:9000/v2/models/model/infer",
        json=reqJson,
        headers=headers
    )
    
    print(response.json())

# Test normal data
infer("income-production", 10, "normal")

# Test drift
infer("income-production", 10, "drift")

# Test outliers
infer("income-production", 10, "outlier")
The response includes:
  • Model predictions
  • Outlier detection results (is_outlier scores)
  • Drift detection metrics (after batch size is reached)

Evidently for Drift Detection

Evidently provides an easy way to generate drift reports:
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, DataQualityPreset
import pandas as pd

# Load reference and current data
reference_data = pd.read_csv("reference.csv")
current_data = pd.read_csv("production.csv")

# Create report
report = Report(metrics=[
    DataDriftPreset(),
    DataQualityPreset()
])

# Run report
report.run(
    reference_data=reference_data,
    current_data=current_data
)

# Save as HTML
report.save_html("drift_report.html")
The report includes:
  • Feature-by-feature drift scores
  • Statistical tests (KS, Chi-squared, etc.)
  • Distribution visualizations
  • Data quality metrics (missing values, duplicates, etc.)

Drift Detection in Pipelines

Integrate Evidently into your ML pipeline:
import pandas as pd
from evidently.test_suite import TestSuite
from evidently.tests import TestColumnDrift

def check_drift(reference_df: pd.DataFrame, production_df: pd.DataFrame) -> bool:
    """Check if drift is detected."""
    tests = TestSuite(tests=[
        TestColumnDrift(column_name="age"),
        TestColumnDrift(column_name="income"),
        TestColumnDrift(column_name="education"),
    ])
    
    tests.run(reference_data=reference_df, current_data=production_df)
    
    # Return True if any test failed
    return not tests.as_dict()["summary"]["all_passed"]

# Use in your pipeline
if check_drift(reference_data, new_data):
    print("Drift detected! Triggering retraining...")
    trigger_retraining_pipeline()

Monitoring Strategy

Design a comprehensive monitoring plan:

1. Define Metrics

  • Feature distributions
  • Missing value rates
  • Outlier frequency
  • Data schema compliance
  • Prediction distributions
  • Confidence scores
  • Class balance (for classification)
  • Output range (for regression)
  • Accuracy, precision, recall (when ground truth available)
  • Prediction-outcome correlation
  • Business metrics (conversion rate, revenue, etc.)
  • Latency (p50, p95, p99)
  • Throughput (requests per second)
  • Error rates
  • Resource usage

2. Ground Truth Collection

Ground truth is essential for measuring actual performance:
  • Delayed feedback: Collect outcomes days or weeks later
  • User feedback: Thumbs up/down, corrections
  • A/B testing: Compare model variants
  • Manual labeling: Sample and label production data
  • Proxy metrics: Use correlated signals when direct labels unavailable

3. Alerting Thresholds

Define thresholds for alerts:
alerts:
  - name: HighDriftScore
    condition: drift_score > 0.1
    severity: warning
    
  - name: PerformanceDegradation
    condition: accuracy < 0.85
    severity: critical
    
  - name: HighOutlierRate
    condition: outlier_rate > 0.05
    severity: warning
    
  - name: HighLatency
    condition: p95_latency > 2000ms
    severity: critical

4. Remediation Actions

1

Alert fires

Monitoring system detects drift or performance degradation
2

Investigate

Review dashboards, logs, and recent changes
3

Diagnose

Identify root cause (data quality issue, drift, bug, etc.)
4

Remediate

  • Retrain model on recent data
  • Roll back to previous version
  • Apply hotfix or feature engineering
  • Adjust thresholds or business logic
5

Validate

Verify fix resolves the issue
6

Document

Record incident and learnings for future reference

Best Practices

Monitor Continuously

Don’t wait for complaints. Set up automated monitoring to catch issues early.

Start Simple

Begin with basic metrics (input distributions, latency, errors) before adding complex drift detection.

Use Multiple Methods

Combine statistical tests, business metrics, and manual review for comprehensive monitoring.

Close the Loop

Feed production data back into training to keep models up-to-date.

Additional Resources

Next Steps

Practice Tasks

Complete the homework assignments to apply these monitoring concepts

Build docs developers (and LLMs) love