Skip to main content
This guide covers environment setup, dependency installation, and verification for reproducible experiments.

System requirements

Before you begin, ensure your system meets these requirements:
  • Python: 3.8 or higher (3.10+ recommended)
  • Operating system: Linux, macOS, or Windows with Python support
  • RAM: 4GB minimum (8GB+ recommended for larger experiments)
  • CPU: Any modern x86_64 or ARM processor (CPU-focused, no GPU required)
This framework is designed for CPU execution and reproducible research. It does not require specialized hardware like GPUs or TPUs.

Installation

1

Clone the repository

Clone the project to your local machine:
git clone <repository-url>
cd source
2

Create a virtual environment (recommended)

Isolate your dependencies using a virtual environment:
python -m venv venv
source venv/bin/activate
You should see (venv) in your terminal prompt after activation.
3

Install required dependencies

Install the core packages from requirements.txt:
pip install -r requirements.txt
This installs:
  • numpy (1.26.4): Core array operations and tensor math
  • pandas (2.2.2): Dataset loading and CSV handling
  • matplotlib (3.8.4): Visualization for training curves and benchmarks
  • psutil (5.9.8): Memory profiling and system resource monitoring
  • requests (2.32.3): Dataset download utilities
  • tqdm (4.66.4): Progress bars for training and data loading
4

Install optional dependencies (optional)

For framework comparison and ONNX export, install the optional packages:
pip install torch==2.3.1 onnx==1.16.1 onnxruntime==1.18.1
These dependencies are large (PyTorch is ~800MB). Only install them if you need framework comparison features or ONNX export.
The framework works fully without these packages—they’re guarded by runtime checks.
5

Install development dependencies (optional)

For testing and development, install the dev requirements:
pip install -r requirements-dev.txt
This includes pytest and other tools for running the test suite.

Verify your installation

After installing dependencies, verify your environment is configured correctly.

Run the verification script

Execute the environment verification script:
python scripts/verify_environment.py

Expected output

You should see output similar to this:
Python: 3.11.0
Platform: Linux-5.15.0-x86_64-with-glibc2.35

Required packages:
  - numpy: OK (1.26.4)
  - matplotlib: OK (3.8.4)
  - psutil: OK (5.9.8)
  - requests: OK (2.32.3)
  - tqdm: OK (4.66.4)

Optional packages:
  - torch: OK (2.3.1)
  - pytest: OK (7.4.0)

Dataset status:
  - fashion-mnist: train_exists=False size=0, test_exists=False size=0

Interpreting the output

  • Required packages: All must show OK with version numbers
  • Optional packages: Can show MISSING if you didn’t install them
  • Dataset status: Shows False until you download Fashion-MNIST (see below)
If any required package shows MISSING, re-run pip install -r requirements.txt and check for installation errors.

Understanding the verification script

The verification script checks your environment systematically:
scripts/verify_environment.py
REQUIRED = ["numpy", "matplotlib", "psutil", "requests", "tqdm"]
OPTIONAL = ["torch", "pytest"]

def check_module(name: str) -> str:
    try:
        mod = importlib.import_module(name)
        version = getattr(mod, "__version__", "unknown")
        return f"OK ({version})"
    except Exception as exc:
        return f"MISSING ({exc})"
It verifies:
  1. Python version and platform information
  2. All required packages are importable with correct versions
  3. Optional packages (if installed)
  4. Dataset files (if downloaded)

Dataset preparation

The framework supports two data modes:

Synthetic mode (default)

No setup required. The framework generates random data for fast iteration:
python scripts/run_workflow.py --mode train --experiment baseline
Synthetic mode uses deterministic random generation with fixed seeds for reproducibility.

Real data mode (Fashion-MNIST)

For production-like experiments, download the Fashion-MNIST dataset:
1

Download Fashion-MNIST

Run the download script:
python scripts/download_fashion_mnist.py --out-dir "Neural Network from Scratch/task/Data"
This downloads two CSV files:
  • fashion-mnist_train.csv (~120MB): 60,000 training samples
  • fashion-mnist_test.csv (~20MB): 10,000 test samples
2

Verify the download

Re-run the verification script to confirm the dataset is ready:
python scripts/verify_environment.py
You should see:
Dataset status:
  - fashion-mnist: train_exists=True size=123456789, test_exists=True size=12345678
3

Use the dataset in experiments

Run experiments with the real_fashion_mnist configuration:
python scripts/run_workflow.py --mode full --experiment real_fashion_mnist
The download script validates file integrity using SHA256 checksums and includes retry logic for network failures.

Dependency versions and reproducibility

The framework uses pinned dependency versions for reproducible experiments:
requirements.txt
numpy==1.26.4
pandas==2.2.2
matplotlib==3.8.4
psutil==5.9.8
requests==2.32.3
tqdm==4.66.4
Using different versions may produce different numerical results due to changes in random number generation, floating-point operations, or algorithm implementations.

Why pinned versions matter

For reproducible research:
  1. Numerical stability: NumPy versions can differ in floating-point precision
  2. API compatibility: Avoid breaking changes in dependencies
  3. Deterministic results: Same code + same versions + same seed = same output
  4. Experiment comparison: Compare results across time and machines reliably
If you need to upgrade dependencies, update requirements.txt and re-run all baseline experiments to establish new reference results.

Project structure after installation

After installation, your directory structure looks like this:
source/
├── Neural Network from Scratch/
│   └── task/
│       ├── train.py           # Training entrypoint
│       ├── benchmark.py       # Performance benchmarking
│       ├── student.py         # Neural network implementation
│       ├── config.py          # Experiment configurations
│       ├── dataset_config.py  # Dataset specifications
│       └── Data/              # Dataset directory (after download)
├── scripts/
│   ├── verify_environment.py  # Environment checker
│   ├── run_workflow.py        # Workflow orchestration
│   └── download_fashion_mnist.py  # Dataset downloader
├── experiments/               # Generated logs and checkpoints
├── artifacts/                 # Generated reports
├── requirements.txt
└── requirements-dev.txt

Troubleshooting

Upgrade pip and try again:
pip install --upgrade pip
pip install -r requirements.txt
If conflicts persist, create a fresh virtual environment.
Ensure you’re using an ARM-native Python installation:
python -c "import platform; print(platform.machine())"
Should output arm64. If it shows x86_64, reinstall Python for Apple Silicon.
Check if you activated your virtual environment:
which python  # Should point to venv/bin/python
If not, activate it:
source venv/bin/activate  # Linux/macOS
venv\Scripts\activate      # Windows
Manually download the files from the mirrors and place them in the correct directory:
mkdir -p "Neural Network from Scratch/task/Data"
cd "Neural Network from Scratch/task/Data"
# Download fashion-mnist_train.csv and fashion-mnist_test.csv manually
Then verify the files are recognized:
python scripts/verify_environment.py
You need Python 3.8 or higher. Check your version:
python --version
If it’s too old, install a newer version:

Next steps

Now that your environment is set up:

Run your first experiment

Train a model and see benchmark results in under 5 minutes

Understand the architecture

Learn how the framework is structured

Configure experiments

Customize layer sizes, precision, and constraints

Explore the API

Dive into the module-level documentation

Best practices

For the best experience:
1

Use virtual environments

Always isolate project dependencies to avoid conflicts:
python -m venv venv
source venv/bin/activate
2

Pin dependency versions

Never use pip install package without version pins in production experiments. Always use requirements.txt.
3

Verify after installation

Run scripts/verify_environment.py after any environment changes to catch issues early.
4

Use fixed seeds

All experiments accept a --seed parameter. Use it for reproducible results:
python scripts/run_workflow.py --seed 42 --experiment baseline
Keep a copy of your requirements.txt with each experiment log for full reproducibility.

Build docs developers (and LLMs) love