Overview
The checkpointing system saves and loads model weights in NumPy’s.npz format, enabling training resumption, model sharing, and deployment.
Quick Start
Save Weights
Load Weights
Implementation
Saving Weights
Implemented inmodel.py:157-162:
- All layer weights as
weights1,weights2, etc. - All layer biases as
bias1,bias2, etc. - Uses NumPy’s compressed
.npzformat
Loading Weights
Implemented inmodel.py:164-168:
- Reads all weights and biases from file
- Converts to model’s training dtype
- Restores exact parameter state
Automatic Checkpointing
During Experiments
Experiments automatically save checkpoints (train.py:126-130):
- Format:
{experiment_id}_v{version}.npz - Example:
baseline_v1.npz,baseline_v2.npz - Tracked in experiment history JSON
During Training
Optional checkpoint duringfit() (model.py:229-230):
Checkpoint File Format
NPZ Structure
A checkpoint file contains:Multi-Layer Models
For deeper architectures:Best Practices
Architecture Consistency
Ensure model architecture matches checkpoint:Versioning
Use version numbers for experiment tracking:Precision Handling
Checkpoints preserve original precision:Early Stopping with Checkpoints
Thefit() method supports restoring best weights (model.py:186-227):
- Tracks best validation loss
- Stores best weights in memory
- Restores on early stop or end of training
Checkpoint Management
ExperimentManager Integration
Checkpoints are tracked in experiment history (experiment_manager.py:88-92):
Checkpoint History
Experiment JSON includes checkpoint paths:Loading for Inference
Direct Loading
From Experiment History
Error Handling
Missing Files
Architecture Mismatch
Precision Compatibility
Checkpoints work across precisions:Related
- Running Experiments - Automatic checkpoint creation
- Model Architecture - Understanding layer structure
- Deployment Inference - Using checkpoints for production