Overview
DFC_ADNIDataset is a PyTorch Geometric dataset class for loading and processing dynamic functional connectivity matrices from ADNI brain imaging data. It extends InMemoryDataset and converts time-varying DFC matrices into sequences of graph structures.
Class Definition
Constructor
Parameters
Root directory containing the dataset files. Expected structure:
DFC_Matrices/subdirectory with dynamic FC.npzfiles- Label CSV file (default:
TADPOLE_Simplified.csv)
Threshold for edge filtering. Edges with absolute correlation values below this threshold are removed from the graph.
Filename of the CSV containing subject labels. Must be located in the root directory.
Name of the variable in the
.npz files containing the dynamic FC matrix data.Optional transform to apply to data objects on-the-fly.
Optional transform to apply during data processing.
Attributes
Collated data object containing all graphs.
Dictionary mapping attributes to their slice indices for efficient indexing.
List of subject IDs corresponding to each graph in the dataset. Preserved across save/load operations.
The correlation threshold used for edge filtering.
Methods
fc_to_graph
Converts a single functional connectivity matrix (one time point) into a PyTorch Geometric graph object.matrix(numpy.ndarray): FC matrix of shape (N, N) for a single time pointnode_features(numpy.ndarray, optional): Node feature matrix. If None, uses identity matrix
Data: PyTorch Geometric Data object withx,edge_index, andedge_attr
- Uses upper triangle without self-loops to avoid duplicate edges
- Creates undirected graph by adding both (i, j) and (j, i) edges
- Edge attributes represent correlation strengths
get
Overrides the defaultget method to attach subject IDs to data objects.
idx(int): Index of the graph to retrieve
Data: Graph data object withsubj_idattribute attached
load_subject_labels
Loads subject labels from CSV file.label_csv_path(str): Path to CSV file with subject labelslabel_col(str): Column name containing the label values
dict: Dictionary mapping subject IDs to their labels
Data Format
Each graph in the dataset represents one time window from a dynamic FC analysis and has the following attributes:x: Node feature matrix (default: identity matrix)edge_index: Graph connectivity in COO format (undirected)edge_attr: Edge weights (correlation values)y: Subject label (diagnosis category)time_index: Time window index within the scansubj_id: Subject identifier (format:sub-{SUBJECT_ID}_run-{RUN_NUM})
Usage Example
File Format Requirements
Dynamic FC Matrix Files
Expected filename format:sub-{SUBJECT_ID}_run-{RUN_NUM}_dynamic_fc_matrix.npz
Example: sub-002S0413_run-01_dynamic_fc_matrix.npz
Each .npz file should contain:
- A numpy array with key matching
var_nameparameter (default:'dynamic_fc') - Array shape: (T, N, N) where:
- T = number of time windows
- N = number of brain regions
- Values represent correlation coefficients between regions at each time window
Label CSV Format
Required columns:Subject: Subject identifier (underscores removed for matching)Label_CS_Num: Numeric label (diagnosis category)
Notes
- Each time window from a DFC matrix becomes a separate graph in the dataset
- If a subject has T time windows, they contribute T graphs to the dataset
- DFC matrices are automatically symmetrized at each time point:
(matrix + matrix.T) / 2 - Edges below the threshold are filtered out to create sparse graphs
- The dataset uses upper triangle indexing to avoid duplicate edges in undirected graphs
- Subject IDs are preserved in
subj_id_listfor efficient subject-based indexing - When no label is found for a subject, it defaults to 0
- Processed data is saved to
data_dfc.ptin theprocessed/directory
Backward Compatibility
The class supports loading datasets created before thesubj_id_list feature was added. If loaded data doesn’t include the subject ID list, subj_id_list will be set to None.