inference

Overview

The inference.py module provides a command-line interface for running inference on trained models with configurable precision. It supports loading checkpoints, generating inference reports, and exporting models to ONNX format.

main

Entry point for the inference CLI that loads weights, runs inference, and optionally exports to ONNX.

def main() -> None

Behavior

Model Initialization: Creates NeuralNetwork with fixed architecture [784, 64, 10] and activations ["relu", "softmax"]
Weight Loading: Loads trained weights from .npz checkpoint file
Inference Execution: Generates synthetic test data and runs forward pass
Report Generation: Creates detailed inference report with metrics
Optional ONNX Export: Exports model to ONNX format if requested

CLI Usage

Run inference from the command line:

python inference.py --weights experiments/checkpoints/model_v1.npz

python inference.py --weights model.npz --precision float16 --batch-size 128

python inference.py --weights model.npz --export-onnx

CLI Arguments

--weights

string

required

Path to .npz checkpoint file containing trained model weights

--precision

string

default:"float32"

Inference precision modeOptions: float32, float16, int8

--batch-size

int

default:"64"

Number of samples in synthetic test batch

--export-onnx

flag

Export model to ONNX format at exports/model.onnx

Inference Report Format

The inference_report function (from deployment module) generates a JSON report with the following structure:

{
  "batch_size": 64,
  "precision": "float32",
  "inference_time_ms": 2.34,
  "throughput_samples_per_sec": 27350.4,
  "memory_usage_mb": 1.23,
  "model_size_bytes": 51200,
  "predictions_shape": [64, 10]
}

Report Fields

batch_size

int

Number of samples processed

precision

string

Precision mode used for inference

inference_time_ms

float

Time taken for forward pass in milliseconds

throughput_samples_per_sec

float

Number of samples processed per second

memory_usage_mb

float

Memory footprint during inference

model_size_bytes

int

Size of model parameters in bytes

predictions_shape

list[int]

Shape of output predictions array

Example Output

$ python inference.py --weights model.npz --precision float16

{
  "batch_size": 64,
  "precision": "float16",
  "inference_time_ms": 1.87,
  "throughput_samples_per_sec": 34224.6,
  "memory_usage_mb": 0.62,
  "model_size_bytes": 25600,
  "predictions_shape": [64, 10]
}

$ python inference.py --weights model.npz --export-onnx

Exported ONNX model to exports/model.onnx

Helper Functions

_load_npz_weights

Loads weights from NumPy .npz archive into model.

def _load_npz_weights(model: NeuralNetwork, weights_path: str) -> None

model

NeuralNetwork

required

Model instance to load weights into

weights_path

string

required

Path to .npz checkpoint file

train - Train models and create checkpoints
benchmark - Measure inference performance metrics

Core Components

Configuration

Training & Evaluation

Analysis Tools

CLI Scripts

Overview

main

Behavior

CLI Usage

CLI Arguments

Inference Report Format

Report Fields

Example Output

Helper Functions

_load_npz_weights

Build docs developers (and LLMs) love

Core Components

Configuration

Training & Evaluation

Analysis Tools

CLI Scripts

​Overview

​main

​Behavior

​CLI Usage

​CLI Arguments

​Inference Report Format

​Report Fields

​Example Output

​Helper Functions

​_load_npz_weights

​Related

Build docs developers (and LLMs) love

Overview

main

Behavior

CLI Usage

CLI Arguments

Inference Report Format

Report Fields

Example Output

Helper Functions

_load_npz_weights

Related