Skip to main content

System Architecture

Understand how the Predictive Maintenance System processes sensor data through a dual-model ML pipeline to predict equipment failures.

Deployment Stack

The system is built on a modern, cloud-native stack optimized for real-time data processing:
ComponentTechnologyHostingURL
FrontendReact 18 + ViteVercelpredictive-maintenance-ten.vercel.app
BackendFastAPI + DockerRenderpredictive-maintenance-uhlb.onrender.com
DatabaseInfluxDB 2.xInfluxDB CloudAWS us-east-1

High-Level Architecture

┌────────────────────────────────────────────────────────────────┐
│                   Frontend (React + Vite)                      │
│                      🌐 Vercel                                 │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐  │
│  │ Metrics  │ │  Chart   │ │  Health  │ │  Explanations    │  │
│  │  Cards   │ │ Recharts │ │  Summary │ │     Panel        │  │
│  └──────────┘ └──────────┘ └──────────┘ └──────────────────┘  │
└────────────────────────────┬───────────────────────────────────┘
                             │ HTTPS/JSON (Vercel Rewrites)
┌────────────────────────────▼───────────────────────────────────┐
│                   Backend (FastAPI + Docker)                   │
│                      🚀 Render                                 │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐   │
│  │   Ingest     │ │   Features   │ │    ML Pipeline       │   │
│  │   /ingest    │ │   Engine     │ │  Baseline → Detector │   │
│  └──────────────┘ └──────────────┘ └──────────────────────┘   │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐   │
│  │   Health     │ │  Explainer   │ │    Report            │   │
│  │   Assessor   │ │   Engine     │ │    Generator         │   │
│  └──────────────┘ └──────────────┘ └──────────────────────┘   │
└────────────────────────────┬───────────────────────────────────┘

┌────────────────────────────▼───────────────────────────────────┐
│                 InfluxDB Cloud (Time-Series)                   │
│              sensor_data • features • anomalies                │
└────────────────────────────────────────────────────────────────┘

Frontend (React + Vercel)

Technology Stack

React 18

Component-based UI with hooks for state management

Recharts

Real-time data visualization with 60s sliding windows

Vite

Lightning-fast build tool and dev server

Vercel

Global CDN deployment with automatic HTTPS

Key Features

  • Real-time Charts: Multi-signal streaming with Voltage (V), Current (A), Vibration (g)
  • Fixed Y-Axis Domains: 60s right-anchored sliding window for temporal stability
  • Anomaly Visualization: Red shaded regions when risk ≠ LOW
  • Health Score Ring: Color-coded 0-100 gauge (Green → Yellow → Orange → Red)
  • Glassmorphism UI: Dark theme with translucent cards and backdrop blur
  • Keep-Alive Heartbeat: 10-minute /ping to prevent Render free-tier cold starts

Component Architecture

┌─────────────────────────────────────────────────────────────┐
│  App.jsx                                                     │
│  ├── Header                   (System status badge)          │
│  ├── SystemControlPanel       (Calibrate, Fault Injection)   │
│  ├── MetricCard × 4           (Voltage, Current, PF, Vib)    │
│  ├── SignalChart              (Recharts multi-line)          │
│  ├── HealthSummary            (Score ring, RUL, Risk badge)  │
│  ├── InsightPanel             (Explainability text)          │
│  ├── OperatorLog              (Maintenance event logging)    │
│  └── LogWatcher               (Real-time event feed)         │
└─────────────────────────────────────────────────────────────┘

Backend (FastAPI + Render)

Technology Stack

Python 3.11+

Core runtime with type hints

FastAPI

Async REST API with OpenAPI docs

Pydantic

Schema validation and settings

scikit-learn

Isolation Forest ML models

ReportLab

PDF report generation

Docker

Containerized deployment

Data Processing Pipeline

The backend processes sensor data through six stages:
1

Ingestion & Validation

Endpoint: POST /ingest
# backend/api/routes.py
@router.post("/ingest")
async def ingest_sensor_data(event: SensorEvent):
    # Pydantic schema enforcement
    # UTC timestamp normalization
    # Derived signal: power_kw = V × I × PF / 1000
All sensor data is validated against strict Pydantic schemas before processing.
2

Feature Engineering

Module: backend/features/calculator.pyThe system computes two feature sets:Legacy Features (1Hz, 6 dimensions):
  • voltage_rolling_mean_1h: Mean voltage over 1 hour
  • current_spike_count: Points > 3σ from local mean
  • power_factor_efficiency_score: (PF - 0.8) / 0.2 × 100
  • vibration_intensity_rms: √(mean(vibration²))
  • voltage_stability: |V - 230.0|
  • power_vibration_ratio: vibration / (PF + 0.01)
Batch Features (100Hz windows, 16 dimensions):
  • For each signal (voltage, current, power_factor, vibration):
    • mean, std, peak_to_peak, rms
  • 4 signals × 4 stats = 16 features
The batch model achieves 99.6% F1-score by explicitly capturing variance—critical for detecting “Jitter” faults where averages look normal but standard deviation spikes.
3

ML Inference (Dual Models)

Modules:
  • backend/ml/detector.py (Legacy)
  • backend/ml/batch_detector.py (Batch)
Both models are Isolation Forest classifiers:
from sklearn.ensemble import IsolationForest

# Trained during calibration on healthy data
model = IsolationForest(
    contamination=0.05,  # 5% expected anomaly rate
    random_state=42,
    n_estimators=100
)
Output: Anomaly score (0.0 = healthy, 1.0 = critical)
4

Health Assessment & Degradation Tracking

Module: backend/rules/assessor.pyThe system maintains a Cumulative Degradation Index (DI):
# Dead-zone: healthy noise produces zero damage
HEALTHY_FLOOR = 0.65
if batch_score < HEALTHY_FLOOR:
    effective_severity = 0.0
else:
    effective_severity = (batch_score - HEALTHY_FLOOR) / (1.0 - HEALTHY_FLOOR)

# Cumulative damage increment
SENSITIVITY_CONSTANT = 0.005
DI_increment = (effective_severity ** 2) * SENSITIVITY_CONSTANT * dt
DI = min(DI + DI_increment, 1.0)  # monotonic, capped at 1.0

# Health & RUL derived from DI
health_score = round(100 * (1.0 - DI))
RUL_hours = (1.0 - DI) / max(damage_rate, 1e-9)
Risk Classification:
Health ScoreRisk LevelColorTypical RUL
75-100LOWGreen30-60 days
50-74MODERATEYellow10-29 days
25-49HIGHOrange1-9 days
0-24CRITICALRed< 1 day
5

Explainability Engine

Module: backend/rules/explainer.pyGenerates human-readable explanations:
# Example outputs:
"High vibration variance: σ=0.17g (5x normal baseline)"
"Voltage spike detected: 3.2σ above rolling mean"
"Power factor degradation: 0.78 (target: 0.92)"
Every alert includes natural language explanations so operators understand why the system flagged an issue.
6

Persistence & Reporting

Storage: InfluxDB Cloud
  • sensor_data: Raw 100Hz measurements
  • features: Computed 1Hz and batch features
  • health_reports: DI, health scores, risk levels
Reports:
  • Executive PDF (1-page): Health grade, DI%, RUL for plant managers
  • Multi-sheet Excel: Summary, operator logs, raw sensor data for analysts
  • Industrial Certificate (5-page): Feature contributions, ROI analysis, audit trail for engineers

Database (InfluxDB Cloud)

Why InfluxDB?

Time-Series Optimized

Purpose-built for sensor data with millisecond precision

Flux Query Language

Powerful aggregation and windowing functions

Data Retention Policies

Automatic downsampling and archival

Cloud-Native

Managed service with automatic backups

Data Model

┌─────────────────────────────────────────────────────────────┐
│  MEASUREMENT: sensor_data                                    │
│  ├── TAGS                                                    │
│  │   ├── asset_id: "Motor-01"                               │
│  │   └── location: "Plant-A"                                │
│  ├── FIELDS                                                  │
│  │   ├── voltage_v: 230.5                                    │
│  │   ├── current_a: 12.3                                     │
│  │   ├── power_factor: 0.92                                  │
│  │   ├── vibration_g: 0.15                                   │
│  │   └── power_kw: 2.61 (derived)                            │
│  └── TIMESTAMP: 2026-03-02T12:34:56.789Z                     │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  MEASUREMENT: health_reports                                 │
│  ├── TAGS                                                    │
│  │   ├── asset_id: "Motor-01"                               │
│  │   └── risk_level: "MODERATE"                             │
│  ├── FIELDS                                                  │
│  │   ├── health_score: 68.0                                  │
│  │   ├── degradation_index: 0.32                             │
│  │   ├── damage_rate: 0.0012                                 │
│  │   ├── rul_hours: 266.67                                   │
│  │   ├── batch_score: 0.78                                   │
│  │   └── legacy_score: 0.45                                  │
│  └── TIMESTAMP: 2026-03-02T12:34:56.789Z                     │
└─────────────────────────────────────────────────────────────┘

ML Pipeline Deep Dive

Dual-Model Architecture

The system runs two Isolation Forest models in parallel:
Input: 1Hz aggregated features (6 dimensions)Features:
  1. voltage_rolling_mean_1h
  2. current_spike_count
  3. power_factor_efficiency_score
  4. vibration_intensity_rms
  5. voltage_stability
  6. power_vibration_ratio
Performance:
  • Precision: 64.1%
  • Recall: 100.0%
  • F1-Score: 78.1%
  • Limitation: Cannot detect variance-only faults (Jitter)
Use Case: Backward compatibility, fast inference (50ms)

Training Workflow

1

Calibration Request

POST /system/calibrate
{
  "asset_id": "Motor-01",
  "duration_seconds": 60,
  "sampling_rate_hz": 100
}
2

Healthy Data Generation

The system generates synthetic sensor data matching real-world patterns:
  • Voltage: 230V ± 5% (Indian grid)
  • Current: 10-15A with power factor coupling
  • Vibration: 0.05-0.20g with white noise
3

Feature Extraction

  • 1Hz features: Computed from rolling windows
  • Batch features: 100-point windows reduced to 16-D vectors
4

Model Training

# Both models use IsolationForest
from sklearn.ensemble import IsolationForest

model = IsolationForest(
    contamination=0.05,
    random_state=42,
    n_estimators=100
)
model.fit(healthy_features)
5

Baseline Persistence

Trained models and baseline statistics are saved:
  • backend/models/isolation_forest_model.pkl (Legacy)
  • backend/models/batch_isolation_forest.pkl (Batch)
  • Baseline targets written to InfluxDB for dashboard display

Fault Detection Types

The system detects four fault types:
Pattern: Sharp transients in electrical signalsExample:
  • Voltage: 230V → 280V (21% spike)
  • Current: 12A → 45A (375% surge)
Detection: Both models detect via peak_to_peak and spike count featuresReal-World Causes: Grid instability, inrush current, capacitor switching
Pattern: Slow trend away from baselineExample:
  • Power factor: 0.92 → 0.78 over 10 minutes
  • Vibration: 0.15g → 0.35g gradual increase
Detection: Both models detect via rolling mean featuresReal-World Causes: Bearing wear, insulation degradation, misalignment
Pattern: Stable average, high standard deviationExample:
  • Vibration mean: 0.15g (normal)
  • Vibration σ: 0.17g (5x healthy baseline of 0.03g)
Detection: Batch model only (has explicit std features)Real-World Causes: Loose mounting bolts, rotor imbalance, electrical noise
The legacy model cannot detect Jitter faults because it only sees 1Hz averages. This is why the batch model achieves 99.6% F1 vs. 78.1%.
Pattern: Multiple simultaneous anomaliesExample:
  • Voltage drift + current spikes + vibration jitter
Detection: Both models combine evidence from all featuresReal-World Causes: Cascading failures, mechanical + electrical faults

Data Flow

Performance Specifications

OperationLatencyNotes
Batch Feature Extraction0.1ms100-point window → 16-D vector (NumPy)
ML Inference (Batch)1msIsolationForest on 16-D scaled input
ML Inference (Legacy)50ms6-feature Isolation Forest
Data Ingestion100 Hz100 raw points/second to InfluxDB
Server-Side Aggregation5msaggregateWindow(1s, mean) Flux query
PDF Generation~1.2s5-page Industrial Certificate
Dashboard Update3s poll aggregated data delivery
API Response (p99)100msAll endpoints

Resilience Features

DI Hydration

Degradation Index recovered from InfluxDB on restart—state survives process crashes

Keep-Alive Heartbeat

Frontend pings /ping every 10 minutes to prevent Render cold starts

Docker Restart Policy

restart: unless-stopped ensures automatic recovery

Health Checks

All containers have health probes for orchestrator monitoring

Next Steps

API Reference

Explore REST endpoints for sensor ingestion and reporting

Testing Guide

Run the 182-test suite and benchmark models

Deployment Guide

Deploy to production on Render + Vercel + InfluxDB Cloud

Feature Engineering

Deep dive into the 16-D batch feature extraction

Build docs developers (and LLMs) love