Overview
The CPU inference module provides a lightweight wrapper for running model predictions on CPU hardware with built-in performance monitoring. It tracks inference latency and computes probability distribution statistics for model outputs.Function Signature
deployment/cpu_inference.py:7
Usage Example
Returned Metrics
inference_latency_ms
Measures the wall-clock time from prediction start to completion using high-precisiontime.perf_counter(). This metric is critical for:
- SLA compliance in production deployments
- Identifying performance regressions
- Capacity planning for concurrent requests
output_mean_probability
The mean of predicted probabilities for the positive class (index 1). Computed asprobs.mean() where probs = model.predict_proba(X)[:, 1].
Useful for:
- Detecting distribution shift in production data
- Monitoring model calibration over time
- Identifying batch-level anomalies
output_std_probability
Standard deviation of predicted probabilities. High variance may indicate:- Diverse risk profiles in the input batch
- Model uncertainty on out-of-distribution samples
- Potential calibration issues
Performance Considerations
Batch Size: Larger batches may improve throughput but increase latency per request. Profile with your expected workload. Model Type: Tree-based models (Random Forest, XGBoost) typically have different CPU utilization patterns than linear models. Feature Count: Inference latency scales roughly linearly with the number of input features for most model types.Implementation Details
The function uses:time.perf_counter()for nanosecond-precision timing- Probability extraction for binary classification (class index 1)
- Type-safe float conversion for JSON serialization
deployment/cpu_inference.py:7-15: