Time-Aware Prediction

Overview

STGNN supports time-aware prediction using temporal gap features that capture the time intervals between patient visits and to future prediction targets. This enables the model to account for varying prediction horizons and improve accuracy for both short-term and long-term forecasts.

Temporal Gap Processing

The temporal_gap_processor.py module provides comprehensive utilities for parsing visit codes, calculating temporal gaps, and normalizing time features.

VISCODE Parsing

The system automatically parses ADNI visit codes to standardized month values:

from temporal_gap_processor import parse_viscode_to_months

# Baseline visits (0 months)
parse_viscode_to_months('bl')      # → 0
parse_viscode_to_months('scmri')   # → 0
parse_viscode_to_months('v02')     # → 0

# Follow-up visits
parse_viscode_to_months('v04')     # → 3 months
parse_viscode_to_months('m06')     # → 6 months
parse_viscode_to_months('v11')     # → 12 months
parse_viscode_to_months('y2')      # → 24 months

Supported VISCODE formats:

Baseline codes: sc, scmri, bl, init, v02, v03, v06
ADNI-1/GO: m03, m06, m12, m18, m24, m36, m48, etc.
ADNI-2: v04 (3m), v05 (6m), v11 (12m), v21 (24m), v31 (36m), etc.
ADNI-3: y1, y2, y3, y4, y5

See temporal_gap_processor.py:6-56 for the complete mapping.

Calculating Temporal Gaps

Use calculate_temporal_gaps() to compute time intervals between consecutive visits:

from temporal_gap_processor import calculate_temporal_gaps
import pandas as pd

df = pd.read_csv('TADPOLE_Simplified.csv')
df_with_gaps = calculate_temporal_gaps(
    df, 
    subject_col='Subject', 
    visit_col='Visit'
)

# New columns added:
# - visit_months: parsed month value
# - months_to_next: gap to next visit
# - visit_order: temporal ordering (1, 2, 3, ...)

The function handles:

Out-of-order visit records (automatically sorted by time)
Multiple subjects in a single dataframe
Missing or irregular visit patterns

Implementation: temporal_gap_processor.py:59-91

Time Normalization Methods

Four normalization strategies are available via normalize_time_gaps():

1. Logarithmic (Default)

normalize_time_gaps(gaps, method='log')

Formula: log(1 + months/12)
Range: Unbounded (typically 0-2 for 0-60 months)
Best for: Capturing both short-term and long-term patterns
Examples:
- 3 months → 0.22
- 6 months → 0.41
- 12 months → 0.69
- 24 months → 1.10
- 48 months → 1.61

2. Min-Max Scaling

normalize_time_gaps(gaps, method='minmax', max_months=60.0)

Formula: clip(months / max_months, 0, 1)
Range: [0, 1]
Best for: Linear time relationships
Examples (max_months=60):
- 6 months → 0.10
- 12 months → 0.20
- 30 months → 0.50
- 60+ months → 1.00

3. Bucket Categorization

normalize_time_gaps(gaps, method='buckets')

Buckets:
- 0-6 months → 0.25
- 6-12 months → 0.50
- 12-24 months → 0.75
- 24+ months → 1.00
Best for: Discrete time horizon modeling
Use case: When prediction difficulty varies by time range

4. Raw Years

normalize_time_gaps(gaps, method='raw')

Formula: months / 12
Range: Unbounded (in years)
Best for: Interpretable units
Examples:
- 6 months → 0.5 years
- 12 months → 1.0 years
- 36 months → 3.0 years

Implementation: temporal_gap_processor.py:94-121

Usage in Training

Enable time-aware prediction with command-line arguments:

python main.py \
  --use_time_features \
  --time_normalization log \
  --single_visit_horizon 6 \
  --exclude_target_visit

Key Arguments

--use_time_features: Enable temporal gap features (default: False)
--time_normalization: Normalization method - log, minmax, buckets, or raw (default: log)
--single_visit_horizon: Default prediction horizon in months for subjects with only one visit (default: 6)
--exclude_target_visit: Exclude the target visit from input sequences to prevent data leakage (recommended: True)

See main.py:47-50 and dfc_main.py:50-53 for argument definitions.

Temporal Data Loading

The TemporalDataLoader automatically handles time feature extraction:

from TemporalDataLoader import TemporalDataLoader

loader = TemporalDataLoader(
    dataset=dataset,
    indices=train_indices,
    encoder=encoder,
    device=device,
    batch_size=16,
    exclude_target_visit=True,
    time_normalization='log',
    single_visit_horizon=6
)

for batch in loader:
    graph_seq = batch['graph_seq']      # (batch, seq_len, emb_dim)
    lengths = batch['lengths']           # actual sequence lengths
    labels = batch['labels']             # target labels
    time_gaps = batch['time_gaps']       # normalized temporal gaps

The loader:

Extracts visit sequences per subject
Parses VISCODE to months
Computes gaps to next visit (or prediction target)
Applies selected normalization
Pads sequences to batch max length

Implementation reference: main.py:236-249 and dfc_main.py:340-355

Horizon-Based Evaluation

When time features are enabled, the system evaluates performance by prediction horizon:

def evaluate_by_horizon(loader):
    horizons = {
        '0-6m': {'range': (0, 0.5)},
        '6-12m': {'range': (0.5, 0.8)},
        '12-24m': {'range': (0.8, 1.2)},
        '24m+': {'range': (1.2, float('inf'))}
    }
    # Returns accuracy, F1, and AUC per horizon

Example output:

By horizon:
0-6m: n=45, Acc=0.867, F1=0.750
6-12m: n=38, Acc=0.816, F1=0.667
12-24m: n=52, Acc=0.769, F1=0.600
24m+: n=28, Acc=0.714, F1=0.571

See main.py:382-463 and dfc_main.py:462-543 for implementation.

Temporal Distribution Analysis

Analyze the temporal distribution of your dataset:

from temporal_gap_processor import analyze_temporal_distribution

stats = analyze_temporal_distribution(
    df, 
    visit_col='Visit',
    output_stats=True
)

# Output:
# === Temporal Distribution Analysis ===
# Total visits: 1247
# Valid temporal visits: 1247
# Unique timepoints: 18
# Time range: 0 - 96 months
# Mean: 28.4 months, Median: 24 months
# 
# Visit distribution by time horizon:
#   Baseline: 312 visits
#   0-6 months: 89 visits
#   6-12 months: 187 visits
#   12-24 months: 345 visits
#   24+ months: 314 visits

Implementation: temporal_gap_processor.py:149-185

Best Practices

Always use exclude_target_visit=True in training to prevent data leakage
Use log normalization for most cases - it balances short-term and long-term predictions
Set single_visit_horizon based on your dataset’s typical follow-up interval (6-12 months for ADNI)
Analyze temporal distribution before training to understand time range coverage
Compare horizon-specific metrics to identify where the model struggles

Time Feature Integration

Time features are passed to the temporal model (LSTM/GRU/RNN) as part of the sequence:

logits = classifier(
    graph_seq=graph_embeddings,
    tab_features=None,
    lengths=sequence_lengths
)
# Time gaps are implicitly encoded in the sequence ordering

The temporal model learns to weight recent vs. distant observations based on the prediction horizon.

Implementation Files

temporal_gap_processor.py: Core time processing utilities
TemporalDataLoader.py: Batch creation with time features
main.py: Static FC training with time features
dfc_main.py: Dynamic FC training with time features

Getting Started

Core Concepts

Data & Setup

Training Guide

Model Components

Advanced Features

Results & Evaluation

Time-Aware Prediction

Overview

Temporal Gap Processing

VISCODE Parsing

Calculating Temporal Gaps

Time Normalization Methods

1. Logarithmic (Default)

2. Min-Max Scaling

3. Bucket Categorization

4. Raw Years

Usage in Training

Key Arguments

Temporal Data Loading

Horizon-Based Evaluation

Temporal Distribution Analysis

Best Practices

Time Feature Integration

Implementation Files

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Data & Setup

Training Guide

Model Components

Advanced Features

Results & Evaluation

​Overview

​Temporal Gap Processing

​VISCODE Parsing

​Calculating Temporal Gaps

​Time Normalization Methods

​1. Logarithmic (Default)

​2. Min-Max Scaling

​3. Bucket Categorization

​4. Raw Years

​Usage in Training

​Key Arguments

​Temporal Data Loading

​Horizon-Based Evaluation

​Temporal Distribution Analysis

​Best Practices

​Time Feature Integration

​Implementation Files

Build docs developers (and LLMs) love

Overview

Temporal Gap Processing

VISCODE Parsing

Calculating Temporal Gaps

Time Normalization Methods

1. Logarithmic (Default)

2. Min-Max Scaling

3. Bucket Categorization

4. Raw Years

Usage in Training

Key Arguments

Temporal Data Loading

Horizon-Based Evaluation

Temporal Distribution Analysis

Best Practices

Time Feature Integration

Implementation Files