Overview
Theinference.py module provides a command-line interface for running inference on trained models with configurable precision. It supports loading checkpoints, generating inference reports, and exporting models to ONNX format.
main
Entry point for the inference CLI that loads weights, runs inference, and optionally exports to ONNX.Behavior
- Model Initialization: Creates
NeuralNetworkwith fixed architecture[784, 64, 10]and activations["relu", "softmax"] - Weight Loading: Loads trained weights from
.npzcheckpoint file - Inference Execution: Generates synthetic test data and runs forward pass
- Report Generation: Creates detailed inference report with metrics
- Optional ONNX Export: Exports model to ONNX format if requested
CLI Usage
Run inference from the command line:CLI Arguments
Path to
.npz checkpoint file containing trained model weightsInference precision modeOptions:
float32, float16, int8Number of samples in synthetic test batch
Export model to ONNX format at
exports/model.onnxInference Report Format
Theinference_report function (from deployment module) generates a JSON report with the following structure:
Report Fields
Number of samples processed
Precision mode used for inference
Time taken for forward pass in milliseconds
Number of samples processed per second
Memory footprint during inference
Size of model parameters in bytes
Shape of output predictions array
Example Output
Helper Functions
_load_npz_weights
Loads weights from NumPy.npz archive into model.
Model instance to load weights into
Path to
.npz checkpoint file