benchmark

Overview

The benchmark.py module provides comprehensive performance profiling for neural network models, measuring training time, inference latency, throughput, memory usage, CPU utilization, and energy consumption.

run_benchmarks

Runs a complete benchmark suite across multiple configurations and saves results to CSV.

def run_benchmarks(
    batch_sizes: list[int],
    precision_modes: list[str],
    model_sizes: list[list[int]],
    activations: list[str] | None = None,
    output_csv: str = "benchmark_results.csv",
    n_samples: int = 512,
    epochs: int = 2,
    seed: int = 42,
    enable_profiling: bool = False,
) -> tuple[Path, list[dict]]

Parameters

batch_sizes

list[int]

required

List of batch sizes to benchmark, e.g., [16, 32, 64]

precision_modes

list[string]

required

List of precision modes to test, e.g., ["float32", "float16", "int8"]

model_sizes

list[list[int]]

required

List of model architectures to benchmark, e.g., [[784, 64, 10], [784, 128, 10]]

activations

list[string]

default:"['relu', 'softmax']"

Activation functions for each layer transition

output_csv

string

default:"benchmark_results.csv"

Output CSV filename (saved to benchmarks/ directory)

n_samples

int

default:"512"

Number of synthetic samples to generate for benchmarking

epochs

int

default:"2"

Number of training epochs per benchmark

seed

int

default:"42"

Random seed for reproducible benchmarks

enable_profiling

bool

default:"false"

Enable detailed profiling reports (requires profiler module)

Returns

output_path

Path

Path to the generated CSV file

results

list[dict]

List of benchmark result dictionaries

Example Usage

from benchmark import run_benchmarks

path, results = run_benchmarks(
    batch_sizes=[16, 32, 64],
    precision_modes=["float32", "float16"],
    model_sizes=[[784, 64, 10], [784, 128, 64, 10]],
    n_samples=1000,
    epochs=3,
    output_csv="my_benchmark.csv"
)

print(f"Saved {len(results)} results to {path}")

Output Format

Benchmark results are saved as CSV with the following columns:

Result Fields

seed

int

Random seed used for this benchmark run

layer_sizes

string

Model architecture in "AxBxC" format (e.g., "784x64x10")

precision_mode

string

Precision used: "float32", "float16", or "int8"

batch_size

int

Training batch size

epochs

int

Number of training epochs

n_samples

int

Total number of training samples

train_time_per_epoch_s

float

Average training time per epoch in seconds (6 decimal places)

inference_latency_per_sample_s

float

Average inference time per sample in seconds (8 decimal places)

batch_throughput_samples_per_s

float

Number of samples processed per second during inference (3 decimal places)

peak_memory_mb

float

Peak memory usage during training in megabytes (3 decimal places)

cpu_utilization_percent

float

Average CPU utilization percentage during training (3 decimal places)

final_train_accuracy

float

Final training accuracy after all epochs (6 decimal places)

energy_per_epoch_j

float

Estimated energy consumption per epoch in joules (6 decimal places)

profiling_report

string

Path to detailed profiling report (only present if enable_profiling=True)

Example CSV Output

seed,layer_sizes,precision_mode,batch_size,epochs,n_samples,train_time_per_epoch_s,inference_latency_per_sample_s,batch_throughput_samples_per_s,peak_memory_mb,cpu_utilization_percent,final_train_accuracy,energy_per_epoch_j
42,784x64x10,float32,32,2,512,0.125000,0.00000234,42735.043,12.456,78.234,0.856000,0.015000
42,784x64x10,float16,32,2,512,0.098000,0.00000187,53475.936,8.234,76.890,0.854000,0.011760
42,784x128x10,float32,64,2,512,0.156000,0.00000289,34602.076,18.567,82.145,0.892000,0.018720

CLI Usage

Run benchmarks from the command line:

python benchmark.py

python benchmark.py --seed 123 --output-csv custom_results.csv

CLI Arguments

--seed

int

default:"42"

Global seed for reproducible benchmark generation

--output-csv

string

default:"benchmark_results.csv"

Output CSV filename (saved under benchmarks/ directory)

Default Configuration

When run from CLI, the benchmark uses these defaults:

Batch sizes: [16, 32]
Precision modes: ["float32", "float16", "int8"]
Model sizes: [[16, 32, 4], [16, 64, 4]]
Samples: 256
Epochs: 1

Helper Functions

benchmark_one_setup

Runs a single benchmark configuration and returns metrics.

def benchmark_one_setup(
    layer_sizes: list[int],
    activations: list[str],
    precision_mode: str,
    batch_size: int,
    n_samples: int = 512,
    epochs: int = 2,
    seed: int = 42,
    enable_profiling: bool = False,
) -> dict

Returns a single result dictionary with all benchmark metrics.

make_synthetic_data

Generates synthetic training data for benchmarking.

def make_synthetic_data(
    n_samples: int,
    n_features: int,
    n_classes: int,
    seed: int = 42
) -> tuple[np.ndarray, np.ndarray]

n_samples

int

required

Number of samples to generate

n_features

int

required

Number of input features

n_classes

int

required

Number of output classes

seed

int

default:"42"

Random seed

Returns (X, y) where X is features array and y is labels array.

measure_inference_latency_per_sample

Measures average inference time per sample.

def measure_inference_latency_per_sample(
    model: NeuralNetwork,
    X: np.ndarray,
    precision: str = "float32",
    runs: int = 5
) -> float

Returns average latency in seconds per sample (averaged over runs iterations).

measure_batch_throughput

Measures inference throughput in samples per second.

def measure_batch_throughput(
    model: NeuralNetwork,
    X: np.ndarray,
    precision: str = "float32",
    runs: int = 5
) -> float

Returns throughput in samples per second.

measure_training_time_per_epoch

Measures average training time per epoch.

def measure_training_time_per_epoch(
    model: NeuralNetwork,
    X: np.ndarray,
    y: np.ndarray,
    epochs: int = 3,
    alpha: float = 0.1,
    batch_size: int = 32,
    seed: int = 42
) -> tuple[dict, float]

Returns (history, time_per_epoch) where history is training metrics and time_per_epoch is seconds per epoch.

train - Training pipeline that can be benchmarked
inference - Inference utilities for performance testing

Core Components

Configuration

Training & Evaluation

Analysis Tools

CLI Scripts

Overview

run_benchmarks

Parameters

Returns

Example Usage

Output Format

Result Fields

Example CSV Output

CLI Usage

CLI Arguments

Default Configuration

Helper Functions

benchmark_one_setup

make_synthetic_data

measure_inference_latency_per_sample

measure_batch_throughput

measure_training_time_per_epoch

Build docs developers (and LLMs) love

Core Components

Configuration

Training & Evaluation

Analysis Tools

CLI Scripts

​Overview

​run_benchmarks

​Parameters

​Returns

​Example Usage

​Output Format

​Result Fields

​Example CSV Output

​CLI Usage

​CLI Arguments

​Default Configuration

​Helper Functions

​benchmark_one_setup

​make_synthetic_data

​measure_inference_latency_per_sample

​measure_batch_throughput

​measure_training_time_per_epoch

​Related

Build docs developers (and LLMs) love

Overview

run_benchmarks

Parameters

Returns

Example Usage

Output Format

Result Fields

Example CSV Output

CLI Usage

CLI Arguments

Default Configuration

Helper Functions

benchmark_one_setup

make_synthetic_data

measure_inference_latency_per_sample

measure_batch_throughput

measure_training_time_per_epoch

Related