Overview
The system uses a dual-model architecture with Isolation Forest algorithms to detect anomalies in industrial sensor data. The models are trained on healthy baseline data and score deviations using inverted semantics: 0.0 = Normal, 1.0 = Highly Anomalous.Dual-Model Architecture
Two Isolation Forest models run in parallel, each optimized for different temporal resolutions:| Model | Features | Input Frequency | F1 Score | AUC-ROC | Best For |
|---|---|---|---|---|---|
| Legacy (v2) | 6 | 1 Hz (1-second avg) | 78.1% | 1.000 | Drift detection |
| Batch (v3) | 16 | 100 Hz windows | 99.6% | 1.000 | Spike + Jitter detection |
Primary Model: The batch model (v3) is used for inference during real-time monitoring due to superior performance on high-frequency fault patterns. The legacy model is retained for backward compatibility.
Batch Model (Primary)
16-Dimensional Feature Vector
Each 1-second window of 100 raw samples is reduced to 16 statistical features:| Signal | mean | std | peak_to_peak | rms | Total |
|---|---|---|---|---|---|
voltage_v | ✓ | ✓ | ✓ | ✓ | 4 |
current_a | ✓ | ✓ | ✓ | ✓ | 4 |
power_factor | ✓ | ✓ | ✓ | ✓ | 4 |
vibration_g | ✓ | ✓ | ✓ | ✓ | 4 |
| Total | 16 |
Feature Calculations
For each signal, the following statistics are computed:Why RMS? RMS captures the “energy” of the signal and is more sensitive to outliers than mean. For vibration analysis, RMS is the industry standard (ISO 10816).
Legacy Model (Fallback)
6-Dimensional Feature Set
Derived features computed at 1 Hz:| Feature | Formula | Purpose | ||
|---|---|---|---|---|
voltage_rolling_mean_1h | Mean(voltage, 1 hour window) | Long-term drift | ||
current_spike_count | Count(points > 3σ, 10-point window) | Transient spikes | ||
power_factor_efficiency_score | (PF - 0.8) / 0.2 * 100 | Efficiency degradation | ||
vibration_intensity_rms | RMS(vibration, past-only) | Mechanical health | ||
voltage_stability | ` | voltage - 230.0 | ` | Grid deviation |
power_vibration_ratio | vibration / (PF + 0.01) | Cross-signal interaction |
Legacy Model Limitations
Legacy Model Limitations
The 1 Hz model cannot detect jitter faults where:
- Average vibration = 0.15g (normal)
- Standard deviation = 0.17g (5× healthy baseline)
vibration_std feature.Fault Type Detection
Supported Fault Patterns
SPIKE
Pattern: Sharp transient surges in voltage/current
Detection:
Example: Inrush current during motor start
Detection:
peak_to_peak and std features exceed thresholdsExample: Inrush current during motor start
DRIFT
Pattern: Gradual degradation over time
Detection:
Example: Bearing wear increasing vibration
Detection:
mean values deviate from baselineExample: Bearing wear increasing vibration
JITTER
Pattern: Normal mean, abnormal variance
Detection: High
Example: Loose connection causing erratic readings
Model: Batch model only (legacy model blind to jitter)
Detection: High
std and peak_to_peak with normal meanExample: Loose connection causing erratic readings
Model: Batch model only (legacy model blind to jitter)
DEFAULT
Pattern: General anomaly not matching specific types
Detection: Overall feature deviation
Example: Combined electrical and mechanical issues
Detection: Overall feature deviation
Example: Combined electrical and mechanical issues
Fault Injection Example
Anomaly Scoring
Score Semantics
The system uses inverted scoring for intuitive interpretation:| Score Range | Meaning | Health Impact |
|---|---|---|
| 0.00 - 0.15 | Perfectly normal | Health 100-80 |
| 0.15 - 0.35 | Minor deviation | Health 80-50 |
| 0.35 - 0.65 | Moderate anomaly | Health 50-0 |
| 0.65 - 1.00 | Severe anomaly | Health 0 (critical) |
Calibration Process
Scores are calibrated using quantile-based thresholding:Training on Healthy Baseline
Models are trained only on healthy data to establish normal behavior:Contamination Parameter: Set to 5% (0.05) to allow for natural sensor noise in the healthy baseline. This prevents the model from being overly sensitive to minor fluctuations.
Model Hyperparameters
Isolation Forest Configuration
| Parameter | Value | Purpose |
|---|---|---|
contamination | 0.05 | Expected proportion of outliers in training data |
n_estimators | 100 | Number of isolation trees (higher = more stable) |
random_state | 42 | Seed for reproducibility |
n_jobs | -1 | Use all CPU cores for parallel training |
Feature Scaling
All features are standardized usingStandardScaler:
Why Scaling? Isolation Forest is sensitive to feature magnitudes. Scaling ensures that voltage (230V) doesn’t dominate vibration (0.15g) in the anomaly calculation.
Derived Features (Legacy Model)
The legacy model adds two interaction features:1. Voltage Stability
Measures deviation from Indian Grid nominal voltage:- Healthy: 230.0V → stability = 0.0
- Degraded: 225.0V → stability = 5.0
- Critical: 210.0V → stability = 20.0
2. Power-Vibration Ratio
Cross-signal interaction term for detecting mechanical-electrical coupling:- High ratio: High vibration with low power factor → bearing failure + electrical inefficiency
- Low ratio: Normal vibration with good power factor → healthy operation
The
+ 0.01 epsilon prevents division by zero when power factor is exactly 0.0 (rare but possible during shutdown).Performance Benchmarks
Model Comparison (Phase 15 Validation)
Tested on 1000-sample dataset with known fault labels:| Metric | Legacy Model | Batch Model | Improvement |
|---|---|---|---|
| F1 Score @ 0.5 | 78.1% | 99.6% | +27.5% |
| AUC-ROC | 1.000 | 1.000 | - |
| Jitter Detection | ❌ 0% | ✅ 100% | +100% |
| False Positives | 12 | 2 | -83% |
| False Negatives | 9 | 2 | -78% |
Inference Latency
- Legacy Model: ~5ms per sample (1 Hz)
- Batch Model: ~15ms per batch (100 samples aggregated)
- End-to-end: ~1 second from sensor → dashboard update
Dead-Zone Filtering
To prevent “phantom damage” from healthy sensor noise, the system applies a dead-zone:Model Persistence
Trained models are saved to disk for reuse:- Asset ID
- Isolation Forest model
- StandardScaler parameters
- Training timestamp
- Training sample count
- Calibration threshold (99th percentile)
- Model version (v2)
Source Code Reference
Key implementation files:- Batch Model:
backend/ml/batch_detector.py- 16-D feature Isolation Forest - Legacy Model:
backend/ml/detector.py:1-467- 6-D feature model with derived features - Feature Engineering:
backend/ml/batch_features.py- Statistical aggregation (mean, std, p2p, RMS) - Baseline Training:
backend/ml/baseline.py- Healthy data profiling
Next Steps
Health Assessment
Learn how anomaly scores are converted to health metrics and risk levels
Fault Simulation
Explore fault injection for testing detection capabilities