Overview
FC_ADNIDataset is a PyTorch Geometric dataset class for loading and processing functional connectivity matrices from ADNI brain imaging data. It extends InMemoryDataset and converts FC matrices into graph structures with temporal metadata.
Class Definition
Constructor
Parameters
Root directory containing the dataset files. Expected structure:
FC_Matrices/subdirectory with.npzfiles- Label CSV file (default:
TADPOLE_TEMPORAL.csv)
Threshold for edge filtering. Edges with absolute correlation values below this threshold are removed from the graph.
Filename of the CSV containing subject labels and visit information. Must be located in the root directory.
Name of the variable in the
.npz files containing the FC matrix data.Optional transform to apply to data objects on-the-fly.
Optional transform to apply during data processing.
Attributes
Collated data object containing all graphs.
Dictionary mapping attributes to their slice indices for efficient indexing.
The correlation threshold used for edge filtering.
Dictionary mapping subject IDs to their graph data (initialized as None).
Methods
fc_to_graph
Converts a functional connectivity matrix into a PyTorch Geometric graph object.matrix(numpy.ndarray): FC matrix of shape (N, N)node_features(numpy.ndarray, optional): Node feature matrix. If None, uses normalized identity matrixsubj_id(str, optional): Subject identifier to attach to the graph
Data: PyTorch Geometric Data object withx,edge_index,edge_attr, and optionallysubj_id
load_fc_graphs
Loads all FC matrices from the specified directory and converts them to graphs.base_path(str): Path to directory containing FC matrix.npzfiles
dict: Dictionary mapping subject IDs (format:{subj_id}_run{run_num}) to graph Data objects
load_subject_labels_and_visits
Loads subject labels and visit temporal information from CSV file.label_csv_path(str): Path to CSV file with labels and visit datalabel_col(str): Column name containing the label values
tuple: Two dictionaries:label_dict: Maps subject IDs to labels (uses last visit’s label)visit_dict: Maps subject IDs and run keys to visit metadata includingvisit_code,visit_months, andmonths_to_next
Data Format
Each graph in the dataset has the following attributes:x: Node feature matrix (default: identity matrix scaled by 1.0)edge_index: Graph connectivity in COO formatedge_attr: Edge weights (correlation values)y: Subject label (diagnosis)subj_id: Subject identifier (format:{subject_id}_run{run_num})visit_code: Visit identifier (e.g., ‘bl’, ‘m06’, ‘m12’)visit_months: Months from baseline for this visitmonths_to_next: Months until next visit (-1 if no next visit)
Usage Example
File Format Requirements
FC Matrix Files
Expected filename format:sub-{SUBJECT_ID}_run-{RUN_NUM}_fc_matrix.npz
Example: sub-002S0413_run-01_fc_matrix.npz
Each .npz file should contain:
- A numpy array with key matching
var_nameparameter (default:'arr_0') - Array shape: (N, N) where N is the number of brain regions
- Values represent correlation coefficients between regions
Label CSV Format
Required columns:Subject: Subject identifierLabel_CS_Num: Numeric label (diagnosis category)Visit: Visit code (e.g., ‘bl’, ‘m06’, ‘m12’)Visit_Order: Chronological order of visitsMonths_From_Baseline: Time in months from baseline visitMonths_To_Next_Original: Months until next visit (optional)
Notes
- FC matrices are automatically symmetrized:
(matrix + matrix.T) / 2 - Edges below the threshold are filtered out to create sparse graphs
- Subject IDs have underscores removed for label matching
- For subjects with multiple visits, the last visit’s label is used as the overall subject label
- Node features default to a scaled identity matrix (scale factor: 1.0) to maintain appropriate scale for GNN processing