build_loaders
Creates PyTorch DataLoader objects for training and validation with configurable subsampling and reproducible shuffling.Name of the dataset to load. Supported values:
"mnist", "fashion-mnist".Number of samples per batch for both training and validation loaders.
Maximum number of training samples to use. Pass
None to use the full training set. If specified, takes the first N samples.Maximum number of validation samples to use. Pass
None to use the full validation set. If specified, takes the first N samples.Random seed for the data loader’s generator. Ensures reproducible shuffling of training batches.
Number of worker processes for parallel data loading. Set to
0 for single-process loading.Training data loader with shuffling enabled and seeded random generator.
Validation data loader with shuffling disabled for consistent evaluation.
Data Preprocessing
All datasets are preprocessed with the following transforms:- ToTensor(): Converts PIL images to PyTorch tensors
- Normalize((0.5,), (0.5,)): Normalizes grayscale images to [-1, 1] range
(1, 28, 28) with values in the range [-1, 1].
Supported Datasets
Handwritten digits dataset (60,000 training + 10,000 test images, 28×28 grayscale, 10 classes).
Fashion items dataset (60,000 training + 10,000 test images, 28×28 grayscale, 10 classes).
Datasets are automatically downloaded to the
data/ directory if not already present. First run may take a few moments to download.Usage Example
Error Handling
Performance Considerations
- num_workers: Set to
0for debugging (single-process). Use2-4for training on CPU, or4-8on systems with many cores. - Subsampling: Use
train_subsetandval_subsetfor rapid prototyping and hyperparameter search. - Batch size: Larger batches improve throughput but require more memory. Typical values: 64-256.
Reproducibility
Theseed parameter ensures reproducible shuffling of training data:
- Same seed → same batch order across runs
- Different seeds → different batch orders
- Validation loader is never shuffled (no randomness)
For full reproducibility, also call
set_deterministic(seed) from the Model module before creating data loaders.