Overview
Reproducibility is a core design principle of this project. All experiments are designed to produce identical results when run with the same configuration, enabling reliable comparisons, debugging, and scientific validation.This implementation prioritizes reproducibility over performance, accepting some overhead to guarantee deterministic behavior.
Reproducibility Module
Thereproducibility.py module provides utilities for deterministic execution:
reproducibility.py:12-33
Seeded Components
PYTHONHASHSEED
Environment variable controlling hash randomization for deterministic dict/set ordering
PyTorch seeding is optional since the core implementation only requires NumPy. PyTorch is used for optional comparison features.
Modern NumPy RNG
The model uses NumPy’s new Generator API for better reproducibility:model.py:34
np.random functions because:
Advantages
- Independent RNG instances (no global state)
- Better statistical properties
- Deterministic across NumPy versions
- Thread-safe by design
vs Legacy
- Legacy
np.randomuses global state - Can have version-dependent behavior
- Not thread-safe
- Harder to reason about in complex code
Seeded Operations
All randomness in the model is seeded: Weight initialization:layers.py:4-8
model.py:104-107
model.py:191-194
Seed Hierarchy
The project uses a cascading seed strategy:Configuration-Level Seeding
config.py:11
Benchmark-Level Seeding
benchmark.py:99-103
Deterministic Data Generation
Synthetic datasets are generated deterministically:benchmark.py:31-36
Reproducibility Checklist
The project includes a comprehensive checklist (docs/reproducibility_checklist.md):
Environment Capture
Environment Capture
Run
python scripts/verify_environment.py to validate your environment matches project requirements.Dataset Controls
Dataset Controls
- Expected shape (784 features, 10 classes)
- Label range (0-9)
- Minimum sample count
- Optional SHA256 hash verification
Determinism Controls
Determinism Controls
Execution Artifacts
Execution Artifacts
Reporting Quality
Reporting Quality
Experiment Configuration
Experiments are defined inconfig.py with explicit parameters:
config.py:39-83
Verifying Reproducibility
Same-Machine Reproducibility
Run the same experiment twice:- Same final loss (to floating-point precision)
- Same final accuracy
- Same model weights (checksum)
- Same training history
Cross-Machine Reproducibility
For results to match across different machines:Limitations
Floating-point non-associativity
Floating-point non-associativity
Floating-point arithmetic is not associative:This means:
- Different batch sizes can produce slightly different results
- Parallel reductions may introduce variance
- Order of operations matters
- Single-threaded execution (no race conditions)
- Fixed batch sizes per experiment
- Deterministic data ordering
Hardware-specific optimizations
Hardware-specific optimizations
NumPy links to BLAS libraries (OpenBLAS, Intel MKL, Apple Accelerate) that:
- Use different algorithms
- Have different rounding behavior
- May use CPU-specific instructions
- Document the BLAS library used (
numpy.show_config()) - Consider using a consistent BLAS (e.g., OpenBLAS) across machines
- Accept small differences (~1e-6) as acceptable variance
Operating system differences
Operating system differences
Some sources of variance:
- Hash randomization: Controlled by
PYTHONHASHSEED(set byset_global_seed) - Threading libraries: Single-threaded execution avoids this
- System load: Can affect timing measurements (not functional results)
Optional dependencies
Optional dependencies
PyTorch comparison features are optional:
- Results without PyTorch should be reproducible
- Results with PyTorch require matching PyTorch version
- ONNX export may vary across ONNX versions
Best Practices
Always set seed explicitly
Document environment
Use named configs
Save all artifacts
Statistical Repeats
For benchmarking, multiple runs with the same seed verify reproducibility:- Mean and standard deviation of metrics
- Confidence intervals
- Variance analysis
If standard deviation is non-zero with identical seeds, there’s a reproducibility bug!
Debugging Non-Reproducibility
If results don’t match:Reproducibility in CI
The project is designed for reproducible CI testing:Next Steps
Architecture
Understand how reproducibility is built into the architecture
Hardware Constraints
Learn how constraints affect reproducibility