Skip to main content
The benchmarking system measures training time, inference latency, throughput, memory usage, CPU utilization, and energy consumption across multiple model configurations.

Running Benchmarks

Basic Usage

Run a full benchmark sweep:
python benchmark.py --seed 42 --output-csv benchmark_results.csv

Programmatic Usage

from benchmark import run_benchmarks

path, results = run_benchmarks(
    batch_sizes=[16, 32],
    precision_modes=["float32", "float16", "int8"],
    model_sizes=[[16, 32, 4], [16, 64, 4]],
    n_samples=256,
    epochs=1,
    seed=42,
    output_csv="benchmark_results.csv"
)

print(f"Saved {len(results)} benchmark rows to {path}")

Benchmark Metrics

Training Metrics

  • train_time_per_epoch_s: Average time per training epoch in seconds
  • final_train_accuracy: Final training accuracy after all epochs
  • cpu_utilization_percent: Average CPU usage during training
  • peak_memory_mb: Peak memory consumption in megabytes

Inference Metrics

  • inference_latency_per_sample_s: Per-sample inference time
  • batch_throughput_samples_per_s: Throughput in samples per second

Resource Metrics

  • energy_per_epoch_j: Estimated energy consumption per epoch in joules
  • peak_memory_mb: Peak memory usage tracked with tracemalloc

Custom Benchmark Configuration

from benchmark import benchmark_one_setup

result = benchmark_one_setup(
    layer_sizes=[32, 64, 10],
    activations=["relu", "softmax"],
    precision_mode="float16",
    batch_size=32,
    n_samples=512,
    epochs=2,
    seed=42,
    enable_profiling=True  # Generate profiling report
)

print(f"Training time: {result['train_time_per_epoch_s']:.4f}s")
print(f"Inference latency: {result['inference_latency_per_sample_s']:.6f}s")
print(f"Peak memory: {result['peak_memory_mb']:.2f}MB")

Interpreting Results

CSV Output Format

Benchmark results are saved to benchmarks/benchmark_results.csv with the following columns:
ColumnDescription
seedRandom seed for reproducibility
layer_sizesModel architecture (e.g., “16x32x4”)
precision_modeData type used (float32, float16, int8)
batch_sizeBatch size used for training
epochsNumber of training epochs
n_samplesTotal training samples
train_time_per_epoch_sAverage training time per epoch
inference_latency_per_sample_sPer-sample inference latency
batch_throughput_samples_per_sInference throughput
peak_memory_mbPeak memory usage
cpu_utilization_percentAverage CPU usage
final_train_accuracyFinal training accuracy
energy_per_epoch_jEstimated energy per epoch

Performance Analysis

Compare precision modes:
import pandas as pd

df = pd.read_csv("benchmarks/benchmark_results.csv")
df.groupby("precision_mode").agg({
    "train_time_per_epoch_s": "mean",
    "peak_memory_mb": "mean",
    "final_train_accuracy": "mean"
})
Find optimal batch size:
df.groupby("batch_size").agg({
    "batch_throughput_samples_per_s": "mean",
    "peak_memory_mb": "mean"
}).sort_values("batch_throughput_samples_per_s", ascending=False)

Memory Measurement

Benchmarking uses tracemalloc to track peak memory usage:
import tracemalloc

tracemalloc.start()
result = model.fit(X, y, epochs=3)
current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")

CPU Utilization

CPU usage is sampled continuously during training using psutil:
import psutil
import threading

process = psutil.Process()
samples = []

def sampler():
    while not stop_flag:
        samples.append(process.cpu_percent(interval=0.05))

# Run training while sampling CPU
worker = threading.Thread(target=sampler)
worker.start()
model.fit(X, y, epochs=3)
stop_flag = True
worker.join()

avg_cpu = sum(samples) / len(samples)

Energy Estimation

Energy consumption is estimated based on runtime and precision mode:
from energy_estimation import estimate_runtime_energy_j

energy_j = estimate_runtime_energy_j(
    runtime_seconds=time_per_epoch,
    precision_mode="float32"
)
Lower precision modes (float16, int8) typically consume less energy due to faster computation.

Enabling Profiling

To generate detailed profiling reports during benchmarking:
result = benchmark_one_setup(
    layer_sizes=[32, 64, 10],
    activations=["relu", "softmax"],
    precision_mode="float32",
    batch_size=32,
    enable_profiling=True
)

if "profiling_report" in result:
    print(f"Profiling report: {result['profiling_report']}")
This generates a JSON profiling report in the profiling/ directory.

Next Steps

Profiling

Profile model parameters and memory usage

Statistical Analysis

Run repeated benchmarks with confidence intervals

Build docs developers (and LLMs) love