Overview
Thesetup_temporal_data.py script processes raw ADNI data to create the temporal dataset required for STGNN training. It combines acquisition dates from TADPOLE_COMPLETE.csv with labels from TADPOLE_Simplified.csv to generate TADPOLE_TEMPORAL.csv with temporal features.
Prerequisites
Required Input Files
TADPOLE_COMPLETE.csv
Complete ADNI dataset with all visits and acquisition dates
TADPOLE_Simplified.csv
Simplified ADNI dataset with subject labels
data/ directory.
File Locations
Running the Script
Script Workflow
1. Data Loading
Fromsetup_temporal_data.py:45-52:
Subject IDs are automatically cleaned by removing underscores to ensure consistency with FC matrix filenames.
2. Date Parsing
Fromsetup_temporal_data.py:13-27:
MM/DD/YYYY(e.g.,03/15/2018)MM/DD/YY(e.g.,03/15/18)YYYY-MM-DD(e.g.,2018-03-15)DD/MM/YYYY(e.g.,15/03/2018)
3. Temporal Sequence Creation
Fromsetup_temporal_data.py:72-115:
Calculate temporal features
For each visit:
- Months from baseline: Time elapsed since first visit
- Months to next: Time gap until next visit
- Visit order: Chronological position (1, 2, 3, …)
- Total visits: Number of visits for this subject
4. Temporal Feature Calculation
Months From Baseline
Fromsetup_temporal_data.py:88-95:
Months To Next Visit
Fromsetup_temporal_data.py:97-102:
The last visit for each subject has
months_to_next = NaN since there is no subsequent visit.Date Difference Calculation
Fromsetup_temporal_data.py:29-37:
Output Format
Generated File
Filename:data/TADPOLE_TEMPORAL.csv
Columns
Fromsetup_temporal_data.py:105-120:
| Column | Type | Description | Example |
|---|---|---|---|
Subject | string | Subject ID (no underscores) | 123456 |
Visit | string | Visit code from original data | bl, m06, m12 |
Acq_Date | string | Original acquisition date | 03/15/2018 |
Months_From_Baseline | float | Months since first visit | 0.0, 6.2, 12.5 |
Months_To_Next | float | Months until next visit | 6.2, 6.3, NaN |
Label_CS_Num | int | Cognitive stage label (0=stable, 1=converter) | 0 or 1 |
Visit_Order | int | Chronological visit number | 1, 2, 3, … |
Total_Visits | int | Total visits for this subject | 3, 5, 7, … |
Age | float | Subject age at visit (if available) | 72.5 |
Sex | string | Subject sex (if available) | M, F |
Group | string | Diagnostic group (if available) | AD, MCI, CN |
Output Statistics
The script provides analysis of the temporal dataset:Subject Counts
Fromsetup_temporal_data.py:138-141:
Prediction Horizons
Fromsetup_temporal_data.py:144-153:
Complete Example Run
Troubleshooting
Input file not found
Input file not found
Error:
Error: data/TADPOLE_COMPLETE.csv not found!Solution: Ensure both input CSV files are in the data/ directory:Date parsing warnings
Date parsing warnings
Warning:
Warning: Could not parse date '...': ...Impact: Rows with unparseable dates are excluded from the temporal dataset.Solution: Check date format in TADPOLE_COMPLETE.csv. Supported formats are listed above.Missing subjects in output
Missing subjects in output
Issue: Fewer subjects in output than expectedPossible causes:
- Subject not in
TADPOLE_Simplified.csv(no label available) - No valid acquisition dates for subject
- Subject ID format mismatch
Column name mismatch
Column name mismatch
Issue: Dataset loader expects
Months_To_Next_Original but script creates Months_To_NextSolution: Rename the column after generation:Next Steps
After generatingTADPOLE_TEMPORAL.csv:
Prepare FC Matrices
Organize functional connectivity matrices in the required format
Start Training
Begin training STGNN model with temporal data