Overview
STGNN uses a comprehensive set of evaluation metrics to assess model performance on the binary classification task of predicting Alzheimer’s disease progression (stable vs. converter). All metrics are calculated by theevaluate_detailed function in main.py:319-380.
Primary Metrics
Accuracy
Overall classification accuracy across all test subjects:Balanced Accuracy
Average of per-class recall to handle class imbalance:Area Under ROC Curve (AUC)
Discriminative ability between stable and converter classes:- 1.0 = perfect discrimination
- 0.5 = random guessing
- Values above 0.8 indicate strong predictive power
F1 Score
Harmonic mean of precision and recall, calculated separately for each class:Per-Class Metrics
Precision
Proportion of positive predictions that are correct:Recall (Sensitivity)
Proportion of actual positives correctly identified:Loss Function
Focal Loss
The model uses focal loss to address class imbalance:Cross-Validation Metrics
Metrics are aggregated across all folds in 5-fold stratified cross-validation:Model Selection Criteria
The best model is selected based on validation AUC:unique_preds > 1) to be considered valid.
Evaluation Return Structure
Theevaluate_detailed function returns a comprehensive dictionary:
return_probs=True is specified, predicted probabilities are also included: