Skip to main content

Overview

The benchmark.py module provides comprehensive performance profiling for neural network models, measuring training time, inference latency, throughput, memory usage, CPU utilization, and energy consumption.

run_benchmarks

Runs a complete benchmark suite across multiple configurations and saves results to CSV.
def run_benchmarks(
    batch_sizes: list[int],
    precision_modes: list[str],
    model_sizes: list[list[int]],
    activations: list[str] | None = None,
    output_csv: str = "benchmark_results.csv",
    n_samples: int = 512,
    epochs: int = 2,
    seed: int = 42,
    enable_profiling: bool = False,
) -> tuple[Path, list[dict]]

Parameters

batch_sizes
list[int]
required
List of batch sizes to benchmark, e.g., [16, 32, 64]
precision_modes
list[string]
required
List of precision modes to test, e.g., ["float32", "float16", "int8"]
model_sizes
list[list[int]]
required
List of model architectures to benchmark, e.g., [[784, 64, 10], [784, 128, 10]]
activations
list[string]
default:"['relu', 'softmax']"
Activation functions for each layer transition
output_csv
string
default:"benchmark_results.csv"
Output CSV filename (saved to benchmarks/ directory)
n_samples
int
default:"512"
Number of synthetic samples to generate for benchmarking
epochs
int
default:"2"
Number of training epochs per benchmark
seed
int
default:"42"
Random seed for reproducible benchmarks
enable_profiling
bool
default:"false"
Enable detailed profiling reports (requires profiler module)

Returns

output_path
Path
Path to the generated CSV file
results
list[dict]
List of benchmark result dictionaries

Example Usage

from benchmark import run_benchmarks

path, results = run_benchmarks(
    batch_sizes=[16, 32, 64],
    precision_modes=["float32", "float16"],
    model_sizes=[[784, 64, 10], [784, 128, 64, 10]],
    n_samples=1000,
    epochs=3,
    output_csv="my_benchmark.csv"
)

print(f"Saved {len(results)} results to {path}")

Output Format

Benchmark results are saved as CSV with the following columns:

Result Fields

seed
int
Random seed used for this benchmark run
layer_sizes
string
Model architecture in "AxBxC" format (e.g., "784x64x10")
precision_mode
string
Precision used: "float32", "float16", or "int8"
batch_size
int
Training batch size
epochs
int
Number of training epochs
n_samples
int
Total number of training samples
train_time_per_epoch_s
float
Average training time per epoch in seconds (6 decimal places)
inference_latency_per_sample_s
float
Average inference time per sample in seconds (8 decimal places)
batch_throughput_samples_per_s
float
Number of samples processed per second during inference (3 decimal places)
peak_memory_mb
float
Peak memory usage during training in megabytes (3 decimal places)
cpu_utilization_percent
float
Average CPU utilization percentage during training (3 decimal places)
final_train_accuracy
float
Final training accuracy after all epochs (6 decimal places)
energy_per_epoch_j
float
Estimated energy consumption per epoch in joules (6 decimal places)
profiling_report
string
Path to detailed profiling report (only present if enable_profiling=True)

Example CSV Output

seed,layer_sizes,precision_mode,batch_size,epochs,n_samples,train_time_per_epoch_s,inference_latency_per_sample_s,batch_throughput_samples_per_s,peak_memory_mb,cpu_utilization_percent,final_train_accuracy,energy_per_epoch_j
42,784x64x10,float32,32,2,512,0.125000,0.00000234,42735.043,12.456,78.234,0.856000,0.015000
42,784x64x10,float16,32,2,512,0.098000,0.00000187,53475.936,8.234,76.890,0.854000,0.011760
42,784x128x10,float32,64,2,512,0.156000,0.00000289,34602.076,18.567,82.145,0.892000,0.018720

CLI Usage

Run benchmarks from the command line:
python benchmark.py
python benchmark.py --seed 123 --output-csv custom_results.csv

CLI Arguments

--seed
int
default:"42"
Global seed for reproducible benchmark generation
--output-csv
string
default:"benchmark_results.csv"
Output CSV filename (saved under benchmarks/ directory)

Default Configuration

When run from CLI, the benchmark uses these defaults:
  • Batch sizes: [16, 32]
  • Precision modes: ["float32", "float16", "int8"]
  • Model sizes: [[16, 32, 4], [16, 64, 4]]
  • Samples: 256
  • Epochs: 1

Helper Functions

benchmark_one_setup

Runs a single benchmark configuration and returns metrics.
def benchmark_one_setup(
    layer_sizes: list[int],
    activations: list[str],
    precision_mode: str,
    batch_size: int,
    n_samples: int = 512,
    epochs: int = 2,
    seed: int = 42,
    enable_profiling: bool = False,
) -> dict
Returns a single result dictionary with all benchmark metrics.

make_synthetic_data

Generates synthetic training data for benchmarking.
def make_synthetic_data(
    n_samples: int,
    n_features: int,
    n_classes: int,
    seed: int = 42
) -> tuple[np.ndarray, np.ndarray]
n_samples
int
required
Number of samples to generate
n_features
int
required
Number of input features
n_classes
int
required
Number of output classes
seed
int
default:"42"
Random seed
Returns (X, y) where X is features array and y is labels array.

measure_inference_latency_per_sample

Measures average inference time per sample.
def measure_inference_latency_per_sample(
    model: NeuralNetwork,
    X: np.ndarray,
    precision: str = "float32",
    runs: int = 5
) -> float
Returns average latency in seconds per sample (averaged over runs iterations).

measure_batch_throughput

Measures inference throughput in samples per second.
def measure_batch_throughput(
    model: NeuralNetwork,
    X: np.ndarray,
    precision: str = "float32",
    runs: int = 5
) -> float
Returns throughput in samples per second.

measure_training_time_per_epoch

Measures average training time per epoch.
def measure_training_time_per_epoch(
    model: NeuralNetwork,
    X: np.ndarray,
    y: np.ndarray,
    epochs: int = 3,
    alpha: float = 0.1,
    batch_size: int = 32,
    seed: int = 42
) -> tuple[dict, float]
Returns (history, time_per_epoch) where history is training metrics and time_per_epoch is seconds per epoch.
  • train - Training pipeline that can be benchmarked
  • inference - Inference utilities for performance testing

Build docs developers (and LLMs) love