Overview
The STGNN model requires two main data components:- Functional Connectivity (FC) Matrices - Brain connectivity graphs from rs-fMRI scans
- Temporal Label File - Subject labels and visit information with temporal metadata
Directory Structure
FC Matrix Requirements
File Format
- Location:
data/FC_Matrices/ - Naming Convention:
sub-{SUBJECT_ID}_run-{RUN_NUMBER}_fc_matrix.npz - File Type: NumPy compressed array (
.npz)
Filename Pattern
The dataset loader expects files matching this pattern:XXXXXXis the subject ID (6 digits, no underscores)XXis the run number (2 digits, zero-padded)
NPZ File Contents
Each.npz file must contain:
- Default array key:
arr_0(configurable viavar_nameparameter) - Matrix must be square (N × N where N = number of ROIs)
- Values represent correlation coefficients between brain regions
- Matrix is automatically symmetrized:
(matrix + matrix.T) / 2
Matrix Properties
FromFC_ADNIDataset.py:106:
- Dimensions: Typically 116×116 (AAL atlas) or other ROI parcellation
- Values: Correlation coefficients (typically -1 to 1)
- Symmetry: Enforced automatically during loading
- Thresholding: Edges with |value| < threshold are removed (default: 0.2)
Label File Requirements
File Format
- Filename:
TADPOLE_TEMPORAL.csv - Location:
data/ - Format: CSV with header row
Required Columns
FromFC_ADNIDataset.py:123-158, the following columns are required:
| Column | Type | Description | Example |
|---|---|---|---|
Subject | string | Subject ID (no underscores) | 123456 |
Visit | string | Visit code | bl, m06, m12 |
Label_CS_Num | int | Cognitive stage label | 0 (stable) or 1 (converter) |
Visit_Order | int | Chronological visit number | 1, 2, 3, … |
Months_From_Baseline | float | Time from first visit | 0.0, 6.2, 12.5 |
Months_To_Next_Original | float | Time to next visit | 6.2, 6.3, -1 |
Optional Columns
| Column | Type | Description |
|---|---|---|
Acq_Date | string | Acquisition date |
Age | float | Subject age at visit |
Sex | string | Subject sex |
Group | string | Diagnostic group |
Total_Visits | int | Total visits for subject |
Subject ID Formatting
Data Mapping
Subject to Graph Mapping
The dataset maps FC matrices to labels using:- Extract subject ID from filename:
sub-123456_run-01_fc_matrix.npz→ subject123456, run01 - Create full ID:
123456_run01(underscores in full ID, zero-padded run) - Match to CSV: Use subject
123456for label lookup - Assign visit info: Map run number to chronological visit order
FC_ADNIDataset.py:144-152:
Chronological Visit Ordering
Run numbers correspond to chronologically ordered visits:run-01→ First visit (earliestVisit_Order)run-02→ Second visitrun-03→ Third visit, etc.
The dataset sorts visits by
Visit_Order to ensure proper chronological mapping between FC matrices and temporal information.Label Encoding
Classification Labels
- 0: Stable (no cognitive decline)
- 1: Converter (progressed to worse cognitive stage)
Label Assignment
FromFC_ADNIDataset.py:130-135:
The subject’s overall label is taken from their last chronological visit. This represents the final observed cognitive stage.
Graph Construction
Edge Thresholding
FromFC_ADNIDataset.py:69:
Node Features
FromFC_ADNIDataset.py:74-78:
Validation
Common Issues
Missing 'arr_0' key in NPZ file
Missing 'arr_0' key in NPZ file
Error:
'arr_0' missing in {filepath}, skipping.Solution: Save matrices with correct key:Subject ID mismatch
Subject ID mismatch
Error: No label found for subjectSolution: Remove underscores from subject IDs in CSV:
- Wrong:
123_456 - Correct:
123456
Run number mismatch
Run number mismatch
Error: Visit information not foundSolution: Ensure run numbers start at 01 and match chronological visit order in CSV