Overview
The evaluation module provides metrics to assess both prediction accuracy and computational efficiency of models in the Hospital Data Analysis Platform.Core Evaluation Metrics
When training predictive models, the system automatically computes multiple performance metrics.Accuracy
Proportion of correct predictions:- > 0.80: Good performance for balanced datasets
- 0.70 - 0.80: Acceptable, may need improvement
- < 0.70: Poor performance, investigate feature engineering or model selection
F1 Score
Harmonic mean of precision and recall:- Balances false positives and false negatives
- More robust than accuracy for imbalanced datasets
- Critical when both types of errors have clinical consequences
- > 0.70: Strong performance
- 0.50 - 0.70: Moderate performance
- < 0.50: Poor discrimination
AUC (Area Under ROC Curve)
Measures the model’s ability to discriminate between classes across all thresholds:- 1.0: Perfect classifier
- 0.90 - 1.0: Excellent
- 0.80 - 0.90: Good
- 0.70 - 0.80: Fair
- 0.50 - 0.70: Poor
- 0.5: Random guessing (no better than coin flip)
- Threshold-independent (evaluates model across all possible thresholds)
- Robust to class imbalance
- Clinically meaningful: probability that a random high-risk patient scores higher than a random low-risk patient
Complete Evaluation Example
Latency-Accuracy Tradeoff
For real-time systems, balance prediction accuracy against computational speed:accuracy(float): Model accuracy (0.0 to 1.0)latency_ms(float): Prediction latency in milliseconds
Evaluation Results Dictionary
Theevaluate_predictive_models() function returns a comprehensive dictionary:
Choosing the Right Metric
Use Accuracy When:
- Classes are balanced (roughly equal positive/negative cases)
- False positives and false negatives have equal cost
- Simple interpretation is needed
Use F1 Score When:
- Classes are imbalanced
- Both precision and recall matter
- Need to balance false alarms vs. missed cases
Use AUC When:
- Evaluating overall discrimination ability
- Comparing models with different thresholds
- Class imbalance is present
- Need threshold-independent metric
Use Latency Tradeoff When:
- Deploying to production systems
- Real-time predictions are required
- Computational resources are constrained
Source References
- Accuracy implementation:
modeling/predictive.py:44-45 - F1 score implementation:
modeling/predictive.py:48-56 - AUC implementation:
modeling/predictive.py:59-74 - Evaluation function:
modeling/predictive.py:118-135 - Latency tradeoff:
evaluation/metrics.py:4-5