Pipeline Overview
The lifecycle implements a systematic approach to building, deploying, and maintaining machine learning models:- Data - Load, validate, and version datasets
- Training - Feature engineering, model selection, and calibration
- Optimization - Benchmark performance and quantize for deployment
- Deployment - Export to ONNX and serve predictions via API
- Monitoring - Track drift and trigger retraining
Stage 1: Data
Data ingestion loads raw datasets and applies feature engineering while tracking provenance.Data Loading
src/data.py
Feature Engineering
Engineered features combine raw inputs with domain knowledge:src/features.py
Dataset Metadata
Dataset versions and schemas are tracked inconfig/datasets.yaml:
Stage 2: Training
Training orchestrates preprocessing, cross-validation, model selection, and threshold calibration.Model Training Pipeline
The training script (src/train.py) executes the full pipeline:
src/train.py
Threshold Calibration
Thresholds are calibrated to meet business precision targets:src/train.py
Training Outputs
Training produces versioned artifacts:artifacts/best_model.joblib- Serialized scikit-learn pipelineartifacts/threshold.txt- Calibrated decision thresholdartifacts/metrics.json- Test set performance metricsartifacts/lineage.json- SHA256 hashes for reproducibilityartifacts/drift_baseline.json- Training distribution statistics
Stage 3: Optimization
Optimization measures performance characteristics and prepares models for deployment constraints.Statistical Benchmarking
The benchmark script (benchmarking/statistical_benchmark.py) measures latency distributions:
- Repeated-run latency percentiles (p50, p95, p99)
- Quality metrics stability across runs
- Statistical confidence intervals
Hardware-Aware Trade-offs
The trade-off experiments (hardware_aware_ml/tradeoff_experiments.py) quantify deployment options:
- Model size vs. inference latency
- Quantization impact on accuracy
- Memory footprint constraints
ONNX Quantization
Models are quantized for deployment efficiency:Stage 4: Deployment
Deployment exports trained models to portable formats and serves predictions via REST API.ONNX Export
Models are exported for cross-platform inference:artifacts/model.onnx- Portable model formatartifacts/model_quantized.onnx- Quantized variant
API Serving
The FastAPI service (src/api.py) provides prediction endpoints:
src/api.py
Stage 5: Monitoring
Monitoring tracks production inference patterns and detects distribution drift.Drift Detection
The API automatically tracks feature distributions:src/api.py
Drift Status Endpoint
Check for distribution drift:samples_observed- Number of predictionsdrift_score_max_abs_z- Maximum z-score across featuresdrifted_features- Features exceeding thresholdshould_retrain- Retraining recommendation
Drift Thresholds
Configured inconfig.yaml:
Prediction Logging
All predictions are logged toartifacts/prediction_log.jsonl:
Data Flow
Data flows through the system with versioning at each stage:Trade-offs and Failure Modes
Latency vs Accuracy
Latency vs Accuracy
Quantized artifacts reduce latency by 40-60% but may shift accuracy by 1-2%. Parity checks enforce maximum degradation thresholds.
Throughput vs Queue Delay
Throughput vs Queue Delay
Streaming worker scaling improves throughput but can increase request contention. Load testing quantifies the trade-off space.
Portability vs Feature Completeness
Portability vs Feature Completeness
ONNX export improves cross-platform portability but may not support all scikit-learn operators. Unsupported ops require custom conversion.
Failure Modes
Failure Modes
Release blockers include:
- Parity drift exceeding tolerance
- Schema mismatch between training and serving
- Queue saturation under load
- Missing lineage artifacts
Assumptions and Limitations
The workflow assumes:
- Stable feature names across training and serving
- Statistical confidence intervals depend on run count
- Hardware counters may be unavailable on restricted hosts