Overview
This guide covers common issues encountered when operating the Hospital Data Analysis Platform, along with diagnostic strategies and solutions.Common Failure Modes
1. Missing or Malformed Input Columns
Symptom: KeyError or AttributeError during feature extractionCONFIG.feature_columns.
Diagnosis:
-
Update configuration to match your data:
-
Preprocess data to add missing columns:
-
Validate schema before processing:
2. Memory Limit Exceeded
Symptom: MemoryError or system becomes unresponsiveCONFIG.hardware_memory_limit_mb or system memory.
Diagnosis:
-
Reduce chunk size for streaming:
-
Use chunked processing:
-
Optimize data types:
-
Increase memory limit (if hardware allows):
3. ONNX Export Failures
Symptom: Error during model export to ONNX format-
Use supported estimators: Stick to scikit-learn models with good ONNX support:
-
Check model attributes before export:
-
Simplify custom estimators:
-
Fall back to pickle if ONNX export fails:
4. Non-Reproducible Results
Symptom: Different results on repeated runs despite setting seed Root Cause: Random seed not propagated correctly, or non-deterministic operations. Diagnosis:-
Set seed early:
-
Check environment variables:
-
Verify scikit-learn random_state:
-
Disable parallel processing in scikit-learn:
5. Benchmark Variance Too High
Symptom: Wide confidence intervals, unstable benchmark results Root Cause: System load, insufficient iterations, or inherent algorithm variance. Diagnosis:-
Increase benchmark runs:
-
Run on idle system:
-
Use higher confidence level for critical measurements:
-
Profile system resources:
6. Alert Threshold Drift
Symptom: Excessive false positives or missed anomalies Root Cause: Alert thresholds not calibrated for current data distribution. Diagnosis:-
Recalibrate thresholds based on recent data:
-
Use adaptive thresholds:
-
Monitor threshold effectiveness:
Diagnostic Strategies
Enable Verbose Logging
Checkpoint Intermediate Results
Validate Data at Each Stage
Profile Performance Bottlenecks
Emergency Recovery
Restore from Checkpoints
Safe Mode Execution
Getting Help
If you encounter issues not covered in this guide:- Check logs: Review
CONFIG.output_dir / "debug.log" - Validate environment: Use
reproducibility_context(CONFIG)to capture system state - Isolate the issue: Use checkpoints and validation to identify the failing stage
- Compare with baseline: Run against known-good data and configuration
See Also
- Configuration - Understanding system parameters
- Reproducibility - Ensuring consistent execution
- Benchmarking - Performance measurement and analysis