Skip to main content

Overview

The fault simulation system allows you to inject controlled anomalies into sensor data streams for testing and validation. This is crucial for:
  • Testing ML models without physical equipment damage
  • Validating alert thresholds across different severity levels
  • Training operators on anomaly response
  • Demonstrating system capabilities to stakeholders
Fault injection is non-destructive — it modifies the data stream in memory but does not persist to InfluxDB (except as normal sensor events). Stop injection to return to live data.

Fault Types

The system supports four distinct fault patterns based on real-world industrial failures:

SPIKE

Pattern: Sharp voltage/current surges
Mechanism: Adds random transient spikes (±20-50%) to signals
Real-World: Inrush current, grid instability, motor start
Detection: Both models detect via peak_to_peak and spike count

DRIFT

Pattern: Gradual baseline shift
Mechanism: Linearly increases mean values over time
Real-World: Bearing wear, insulation degradation
Detection: Both models via rolling mean features

JITTER

Pattern: Normal mean, high variance
Mechanism: Adds Gaussian noise with 5× std deviation
Real-World: Loose connections, mechanical resonance
Detection: Batch model only (legacy model blind)

DEFAULT

Pattern: General combined fault
Mechanism: Mix of drift + moderate noise
Real-World: Multiple concurrent issues
Detection: Both models via general feature deviation

Severity Levels

Each fault type can be injected at three severity levels, targeting specific risk classifications:
SeverityTarget HealthTarget RiskDamage RateTime to Critical
MILD50-74MODERATE~0.001/s~15 minutes
MEDIUM25-49HIGH~0.003/s~5 minutes
SEVERE0-24CRITICAL~0.005/s~3 minutes
MILD (Score Target: 0.70-0.80)
  • Vibration: +15% above baseline
  • Voltage: ±5% fluctuation
  • Power factor: -0.03 (0.92 → 0.89)
  • Current: +10% above baseline
MEDIUM (Score Target: 0.85-0.92)
  • Vibration: +35% above baseline
  • Voltage: ±10% fluctuation
  • Power factor: -0.08 (0.92 → 0.84)
  • Current: +25% above baseline
SEVERE (Score Target: 0.95-1.00)
  • Vibration: +60% above baseline
  • Voltage: ±20% fluctuation
  • Power factor: -0.15 (0.92 → 0.77)
  • Current: +50% above baseline

Fault Injection API

Inject Fault

Start injecting a specific fault pattern:
POST /system/inject-fault
Content-Type: application/json

{
  "fault_type": "JITTER",
  "severity": "MEDIUM"
}
Parameters:
  • fault_type (required): SPIKE, DRIFT, JITTER, or DEFAULT
  • severity (required): MILD, MEDIUM, or SEVERE
Response (200 OK):
{
  "status": "injecting",
  "fault_type": "JITTER",
  "severity": "MEDIUM",
  "message": "Fault injection active. Stop via /system/stop-fault."
}

Stop Fault

Return to live sensor data:
POST /system/stop-fault
Response (200 OK):
{
  "status": "stopped",
  "message": "Fault injection stopped. Returning to live data."
}

Check Injection Status

Query current fault state:
GET /system/fault-status
Response (Active):
{
  "active": true,
  "fault_type": "SPIKE",
  "severity": "SEVERE",
  "duration_seconds": 127
}
Response (Inactive):
{
  "active": false,
  "fault_type": null,
  "severity": null,
  "duration_seconds": 0
}

Fault Pattern Implementation

SPIKE Fault

Adds random transient surges to voltage and current:
# From generator/generator.py
import random

def apply_spike_fault(signals, severity):
    spike_magnitude = {
        'MILD': 0.20,    # ±20%
        'MEDIUM': 0.35,  # ±35%
        'SEVERE': 0.50   # ±50%
    }[severity]
    
    # Random spike probability (10% of samples)
    if random.random() < 0.10:
        signals['voltage_v'] *= (1 + random.uniform(-spike_magnitude, spike_magnitude))
        signals['current_a'] *= (1 + random.uniform(-spike_magnitude, spike_magnitude))
    
    return signals
Detection: Batch model sees high voltage_std, current_p2p, and current_std features.

DRIFT Fault

Gradually increases baseline values:
def apply_drift_fault(signals, severity, elapsed_seconds):
    drift_rate = {
        'MILD': 0.0005,   # 0.05% per second
        'MEDIUM': 0.0015, # 0.15% per second
        'SEVERE': 0.003   # 0.30% per second
    }[severity]
    
    drift_factor = 1 + (drift_rate * elapsed_seconds)
    
    signals['vibration_g'] *= drift_factor
    signals['current_a'] *= drift_factor
    signals['power_factor'] *= (1 / drift_factor)  # Inverse drift
    
    return signals
Detection: Both models via vibration_mean, current_mean, and efficiency score degradation.

JITTER Fault

Adds high-variance noise while maintaining normal mean:
import numpy as np

def apply_jitter_fault(signals, severity):
    noise_scale = {
        'MILD': 3.0,    # 3× normal std
        'MEDIUM': 5.0,  # 5× normal std
        'SEVERE': 8.0   # 8× normal std
    }[severity]
    
    # Add zero-mean Gaussian noise
    signals['vibration_g'] += np.random.normal(0, 0.03 * noise_scale)
    signals['voltage_v'] += np.random.normal(0, 2.0 * noise_scale)
    signals['current_a'] += np.random.normal(0, 0.5 * noise_scale)
    
    # Clamp to physical limits
    signals['vibration_g'] = max(0, signals['vibration_g'])
    signals['voltage_v'] = max(0, signals['voltage_v'])
    signals['current_a'] = max(0, signals['current_a'])
    
    return signals
Legacy Model Limitation: The 1Hz model computes rolling means and RMS, which smooth out jitter. Only the batch model’s vibration_std and vibration_p2p features capture this fault.
Example Jitter Detection:
  • Healthy: vibration_mean = 0.12g, vibration_std = 0.015g
  • Jitter (MEDIUM): vibration_mean = 0.12g, vibration_std = 0.075g (5× increase)

DEFAULT Fault

General fault combining drift + noise:
def apply_default_fault(signals, severity):
    # Combine drift and moderate noise
    signals = apply_drift_fault(signals, severity, elapsed_seconds=60)
    signals = apply_jitter_fault(signals, 'MILD')  # Low noise
    return signals

Testing Workflow

End-to-End Validation

  1. Start Calibration to train on healthy data
  2. Verify LOW risk state (health 75-100)
  3. Inject MILD DRIFT → expect MODERATE risk (health 50-74) within 30 seconds
  4. Inject MEDIUM SPIKE → expect HIGH risk (health 25-49) within 20 seconds
  5. Inject SEVERE JITTER → expect CRITICAL risk (health 0-24) within 15 seconds
  6. Stop Fault → health should stabilize (DI remains, but damage rate → 0)
  7. Purge System → DI resets to 0.0, health → 100

Automated Testing Script

import requests
import time

API_URL = "http://localhost:8000"

def test_fault_injection():
    # 1. Start with healthy baseline
    requests.post(f"{API_URL}/system/start-calibration")
    time.sleep(10)  # Collect 10s healthy data
    requests.post(f"{API_URL}/system/finish-calibration")
    
    # 2. Verify LOW risk
    state = requests.get(f"{API_URL}/system/state").json()
    assert state['health_score'] >= 75, "Baseline should be healthy"
    
    # 3. Inject MEDIUM JITTER
    requests.post(f"{API_URL}/system/inject-fault", json={
        "fault_type": "JITTER",
        "severity": "MEDIUM"
    })
    
    # 4. Wait for HIGH risk
    for _ in range(30):  # 30 seconds max
        time.sleep(1)
        state = requests.get(f"{API_URL}/system/state").json()
        if state['risk_level'] == 'HIGH':
            break
    
    assert state['risk_level'] == 'HIGH', f"Expected HIGH, got {state['risk_level']}"
    print(f"✅ JITTER fault detected: Health={state['health_score']}, Risk=HIGH")
    
    # 5. Stop fault
    requests.post(f"{API_URL}/system/stop-fault")
    print("✅ Fault injection test passed")

if __name__ == "__main__":
    test_fault_injection()

Dashboard Integration

Fault Control Panel

The React dashboard provides a visual fault injection interface:
// SystemControlPanel.jsx
const injectFault = async () => {
  const response = await fetch(`${API_URL}/system/inject-fault`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      fault_type: selectedFaultType,  // SPIKE/DRIFT/JITTER/DEFAULT
      severity: selectedSeverity      // MILD/MEDIUM/SEVERE
    })
  });
  const data = await response.json();
  console.log('Fault injection started:', data);
};
UI Features:
  • Dropdown for fault type selection
  • Slider or buttons for severity (🟡 MILD, 🟠 MEDIUM, 🔴 SEVERE)
  • “Inject Fault” button (disabled during calibration)
  • “Stop Fault” button (red, enabled only when injection active)
  • Live status indicator showing active fault + duration

Benchmark Results

Jitter Detection Comparison

Tested with MEDIUM JITTER (vibration_std = 5× baseline):
ModelDetection TimePeak Anomaly ScoreRisk Level Reached
Legacy (v2)❌ Never0.32 (MODERATE)MODERATE (false negative)
Batch (v3)✅ 3 seconds0.89 (HIGH)HIGH (correct)
Why Legacy Fails: The legacy model uses vibration_intensity_rms which is:
RMS = sqrt(mean(vibration^2))
For zero-mean jitter, RMS ≈ baseline, so the model sees no anomaly. Why Batch Succeeds: The batch model explicitly includes vibration_std and vibration_p2p:
vibration_std = 0.075  # 5× baseline → high anomaly score
vibration_p2p = 0.30   # 10× baseline → high anomaly score

Severity Calibration Accuracy

Tested with 100 injection cycles (30s each):
Fault TypeSeverityTarget HealthActual HealthAccuracy
DRIFTMILD50-7462 ± 8✅ 92%
SPIKEMEDIUM25-4938 ± 11✅ 88%
JITTERSEVERE0-2412 ± 6✅ 95%
DEFAULTMILD50-7458 ± 10✅ 90%
Variance is expected due to:
  • Random noise in fault patterns
  • Cumulative DI from previous tests
  • Real-time damage rate fluctuations
For consistent results, run POST /system/purge between tests.

Safety & Limitations

Fault Injection Warnings:
  1. Do NOT use in production: Fault injection is for testing only. It intentionally corrupts data.
  2. State Contamination: If injection runs too long, DI will increase permanently (until purge).
  3. CRITICAL Risk: SEVERE faults can drive health to 0 in under 5 minutes. Monitor closely.
  4. No Automatic Stop: Faults continue indefinitely until manual stop or backend restart.

Best Practices

DO:
  • Purge before each test cycle
  • Start with MILD severity
  • Monitor DI and damage rate
  • Stop faults before switching types
  • Document fault injection in operator logs
DON’T:
  • Inject multiple faults simultaneously (undefined behavior)
  • Run SEVERE faults for > 60 seconds (health → 0)
  • Inject during calibration (will corrupt baseline)
  • Forget to stop faults (backend persists state across restarts)

Source Code Reference

Key implementation files:
  • Fault Patterns: backend/generator/generator.py:200-350 - SPIKE/DRIFT/JITTER/DEFAULT logic
  • Injection API: backend/api/system_routes.py:inject_fault() - HTTP endpoint
  • Fault Config: backend/generator/config.py - Severity mappings and NASA/IMS patterns
  • Dashboard Controls: frontend/src/components/SystemControlPanel/ - UI integration

Next Steps

Health Assessment

Understand how fault injection affects DI, health, and RUL

Reporting

Generate reports documenting fault injection test results

Build docs developers (and LLMs) love