Skip to main content

Overview

The inference.py module provides a command-line interface for running inference on trained models with configurable precision. It supports loading checkpoints, generating inference reports, and exporting models to ONNX format.

main

Entry point for the inference CLI that loads weights, runs inference, and optionally exports to ONNX.
def main() -> None

Behavior

  1. Model Initialization: Creates NeuralNetwork with fixed architecture [784, 64, 10] and activations ["relu", "softmax"]
  2. Weight Loading: Loads trained weights from .npz checkpoint file
  3. Inference Execution: Generates synthetic test data and runs forward pass
  4. Report Generation: Creates detailed inference report with metrics
  5. Optional ONNX Export: Exports model to ONNX format if requested

CLI Usage

Run inference from the command line:
python inference.py --weights experiments/checkpoints/model_v1.npz
python inference.py --weights model.npz --precision float16 --batch-size 128
python inference.py --weights model.npz --export-onnx

CLI Arguments

--weights
string
required
Path to .npz checkpoint file containing trained model weights
--precision
string
default:"float32"
Inference precision modeOptions: float32, float16, int8
--batch-size
int
default:"64"
Number of samples in synthetic test batch
--export-onnx
flag
Export model to ONNX format at exports/model.onnx

Inference Report Format

The inference_report function (from deployment module) generates a JSON report with the following structure:
{
  "batch_size": 64,
  "precision": "float32",
  "inference_time_ms": 2.34,
  "throughput_samples_per_sec": 27350.4,
  "memory_usage_mb": 1.23,
  "model_size_bytes": 51200,
  "predictions_shape": [64, 10]
}

Report Fields

batch_size
int
Number of samples processed
precision
string
Precision mode used for inference
inference_time_ms
float
Time taken for forward pass in milliseconds
throughput_samples_per_sec
float
Number of samples processed per second
memory_usage_mb
float
Memory footprint during inference
model_size_bytes
int
Size of model parameters in bytes
predictions_shape
list[int]
Shape of output predictions array

Example Output

$ python inference.py --weights model.npz --precision float16
{
  "batch_size": 64,
  "precision": "float16",
  "inference_time_ms": 1.87,
  "throughput_samples_per_sec": 34224.6,
  "memory_usage_mb": 0.62,
  "model_size_bytes": 25600,
  "predictions_shape": [64, 10]
}
$ python inference.py --weights model.npz --export-onnx
Exported ONNX model to exports/model.onnx

Helper Functions

_load_npz_weights

Loads weights from NumPy .npz archive into model.
def _load_npz_weights(model: NeuralNetwork, weights_path: str) -> None
model
NeuralNetwork
required
Model instance to load weights into
weights_path
string
required
Path to .npz checkpoint file
  • train - Train models and create checkpoints
  • benchmark - Measure inference performance metrics

Build docs developers (and LLMs) love