Overview
The hardware profiling module provides utilities for estimating computational costs, memory usage, and throughput characteristics of model operations under different hardware constraints.Functions
build_hardware_profile_table
Builds a comprehensive hardware profile including operator-level costs, memory estimates, and precision tradeoffs.Number of features in the input data
Batch size for processing
Time interval between stream chunks in milliseconds
A dictionary containing:operator_profile (list[dict]): List of operator-level statistics with keys:
operator(str): Name of the operationlatency_ms(float): Estimated latency in millisecondsmemory_kb(float): Estimated memory usage in kilobytes
input_normalizationlinear_projectionactivationdecision_head
latency_ms(float): Total latency across all operatorsmemory_kb(float): Total memory usageestimated_bandwidth_mb_s(float): Estimated memory bandwidth in MB/sstream_utilization(float): Ratio of processing time to stream interval (0-1)
fp32_memory_kb(float): Memory usage with 32-bit floating pointfp16_memory_kb(float): Memory usage with 16-bit floating point (50% of fp32)fp16_memory_kb(float): Memory usage with 8-bit integer (25% of fp32)note(str): Warning about deployment-dependent latency effects
cache_sensitivity(str): Notes on cache behaviorbottleneck(str): Identification of performance bottlenecks
write_hardware_profile_artifacts
Writes hardware profile data to CSV files for analysis and reporting.Hardware profile dictionary returned by
build_hardware_profile_tableDirectory where CSV files will be written. Created if it doesn’t exist
Dictionary mapping artifact names to their file paths:
operator_profile_csv: Path to operator-level profile CSVhardware_totals_csv: Path to aggregate totals CSV
operator_profile.csv: Operator-level latency and memory statisticshardware_totals.csv: Aggregate metrics and bandwidth estimates