Skip to main content

Overview

The hardware simulation module (hardware_simulation.py) provides tools to model and enforce realistic hardware constraints during training and inference. This enables experimentation with resource-constrained scenarios without requiring specialized hardware.
These simulations are approximations designed for comparative studies and architecture exploration, not precise hardware predictions.

Hardware Simulation Configuration

Constraints are specified through the HardwareSimulationConfig dataclass:
hardware_simulation.py:17-24
@dataclass
class HardwareSimulationConfig:
    enabled: bool = False
    max_memory_mb: float = 512.0
    compute_speed_factor: float = 1.0
    precision_mode: str = "float32"  # float32 | float16 | int8
    batch_size_limit: int = 128

Configuration Parameters

enabled
bool
default:"False"
Enable hardware constraint enforcement
max_memory_mb
float
default:"512.0"
Maximum memory budget in megabytes for parameters and activations
compute_speed_factor
float
default:"1.0"
Artificial slowdown multiplier (1.0 = no slowdown, 2.0 = 2x slower)
precision_mode
string
default:"float32"
Numeric precision mode: float32, float16, or int8
batch_size_limit
int
default:"128"
Hard upper bound on batch size regardless of memory

Memory Estimation

Memory consumption is split into two components: parameter memory and activation memory.

Parameter Memory

Parameter memory is constant and depends only on model architecture and precision:
hardware_simulation.py:42-44
def estimate_parameter_memory_mb(model: Any, precision_mode: str = "float32") -> float:
    total_params = sum(_layer_param_count(layer) for layer in getattr(model, "layers", []))
    return (total_params * _dtype_bytes(precision_mode)) / (1024 ** 2)
Bytes per parameter by precision:

float32

4 bytes per parameter

float16

2 bytes per parameter

int8

1 byte per parameter

Activation Memory

Activation memory scales linearly with batch size:
hardware_simulation.py:47-58
def estimate_activation_memory_mb(model: Any, batch_size: int, precision_mode: str = "float32") -> float:
    if not hasattr(model, "layer_sizes"):
        return 0.0

    dtype_bytes = _dtype_bytes(precision_mode)
    activation_elements = 0

    # input + each layer output
    for width in model.layer_sizes:
        activation_elements += int(batch_size) * int(width)

    return (activation_elements * dtype_bytes) / (1024 ** 2)
This estimate includes the input and each layer’s output but excludes intermediate gradient storage, making it a conservative lower bound.

Total Memory Calculation

hardware_simulation.py:61-64
def estimate_total_memory_mb(model: Any, batch_size: int, precision_mode: str = "float32") -> float:
    return estimate_parameter_memory_mb(model, precision_mode) + estimate_activation_memory_mb(
        model, batch_size, precision_mode
    )

Memory Example: 784-64-10 Network

Let’s calculate memory for the default Fashion-MNIST architecture: Parameters:
  • Layer 1: (784 × 64) + 64 = 50,240
  • Layer 2: (64 × 10) + 10 = 650
  • Total: 50,890 parameters
Activations (batch_size=32):
  • Input: 32 × 784 = 25,088
  • Hidden: 32 × 64 = 2,048
  • Output: 32 × 10 = 320
  • Total: 27,456 elements
Memory in float32:
  • Parameters: 50,890 × 4 = 203,560 bytes ≈ 0.194 MB
  • Activations: 27,456 × 4 = 109,824 bytes ≈ 0.105 MB
  • Total: ~0.3 MB
Memory in float16:
  • Parameters: 50,890 × 2 ≈ 0.097 MB
  • Activations: 27,456 × 2 ≈ 0.052 MB
  • Total: ~0.15 MB (50% reduction)
Memory in int8:
  • Parameters: 50,890 × 1 ≈ 0.048 MB
  • Activations: 27,456 × 1 ≈ 0.026 MB
  • Total: ~0.075 MB (75% reduction)
These small values allow experimentation on severely constrained devices or testing memory pressure with larger architectures.

Adaptive Batch Size

When memory constraints are active, the system automatically reduces batch size using binary search:
hardware_simulation.py:67-92
def adjust_batch_size_to_memory(
    model: Any,
    requested_batch_size: int,
    max_memory_mb: float,
    precision_mode: str = "float32",
    batch_size_limit: int = 128,
) -> int:
    capped_batch = max(1, min(int(requested_batch_size), int(batch_size_limit)))

    if estimate_total_memory_mb(model, capped_batch, precision_mode) <= max_memory_mb:
        return capped_batch

    low, high = 1, capped_batch
    feasible = 0

    while low <= high:
        mid = (low + high) // 2
        memory = estimate_total_memory_mb(model, mid, precision_mode)
        if memory <= max_memory_mb:
            feasible = mid
            low = mid + 1
        else:
            high = mid - 1

    return feasible

Algorithm Properties

1

Cap to limit

First enforce the hard batch_size_limit cap
2

Quick check

If requested batch fits in memory, return immediately
3

Binary search

Find largest feasible batch size using binary search (O(log n))
4

Fallback

If no feasible size found, return 0 (caller must handle)
If even batch_size=1 exceeds memory, the function returns 0. The caller should emit a warning and either abort or proceed with batch_size=1 anyway.

Compute Slowdown Simulation

To simulate slower hardware (e.g., edge devices, low-power CPUs), the system can artificially delay execution:
hardware_simulation.py:99-106
def apply_compute_slowdown(elapsed_seconds: float, compute_speed_factor: float) -> float:
    if compute_speed_factor <= 1.0:
        return 0.0

    delay = elapsed_seconds * (compute_speed_factor - 1.0)
    time.sleep(delay)
    return delay
Example scenarios:
compute_speed_factorInterpretationIf training takes 10s
1.0No slowdown10s (no added delay)
1.550% slower15s (5s added delay)
2.02x slower20s (10s added delay)
3.03x slower30s (20s added delay)
This is a linear model that assumes compute time scales proportionally. Real hardware differences are more complex (cache effects, instruction sets, etc.).

Constraint Enforcement

The prepare_hardware_constrained_run function orchestrates all constraint checks:
hardware_simulation.py:108-157
def prepare_hardware_constrained_run(
    model: Any,
    requested_batch_size: int,
    simulation_config: HardwareSimulationConfig,
) -> Dict[str, Any]:
    if not simulation_config.enabled:
        return {
            "enabled": False,
            "batch_size": requested_batch_size,
            "warnings": [],
        }

    warnings: List[str] = []
    precision = simulation_config.precision_mode
    adjusted_batch_size = adjust_batch_size_to_memory(
        model=model,
        requested_batch_size=requested_batch_size,
        max_memory_mb=simulation_config.max_memory_mb,
        precision_mode=precision,
        batch_size_limit=simulation_config.batch_size_limit,
    )

    if adjusted_batch_size == 0:
        warnings.append(
            "Model cannot run under current memory and precision constraints; even batch_size=1 exceeds max_memory_mb."
        )
        adjusted_batch_size = 1

    projected_memory = estimate_total_memory_mb(model, adjusted_batch_size, precision)

    if projected_memory > simulation_config.max_memory_mb:
        warnings.append(
            f"Projected memory ({projected_memory:.4f} MB) exceeds limit ({simulation_config.max_memory_mb:.4f} MB)."
        )

    if adjusted_batch_size < requested_batch_size:
        warnings.append(
            f"Batch size reduced from {requested_batch_size} to {adjusted_batch_size} due to memory constraints."
        )

    apply_precision_constraint(model, precision)

    return {
        "enabled": True,
        "batch_size": adjusted_batch_size,
        "precision_mode": precision,
        "estimated_memory_mb": round(projected_memory, 6),
        "warnings": warnings,
    }

Return Structure

The function returns a dictionary with:
enabled
bool
Whether constraints were applied
batch_size
int
Adjusted batch size (may be smaller than requested)
precision_mode
string
Applied precision mode
estimated_memory_mb
float
Projected memory consumption
warnings
list
List of constraint violation messages

Training with Constraints

The complete training flow with hardware constraints:
hardware_simulation.py:159-193
def run_training_with_hardware_constraints(
    model: Any,
    X,
    y,
    epochs: int,
    alpha: float,
    batch_size: int,
    seed: int,
    simulation_config: HardwareSimulationConfig,
) -> Dict[str, Any]:
    setup = prepare_hardware_constrained_run(model, batch_size, simulation_config)
    effective_batch_size = int(setup["batch_size"])

    start = time.perf_counter()
    history = model.fit(
        X,
        y,
        epochs=epochs,
        alpha=alpha,
        batch_size=effective_batch_size,
        seed=seed,
    )
    elapsed = time.perf_counter() - start
    added_delay = apply_compute_slowdown(elapsed, simulation_config.compute_speed_factor)

    result = {
        "setup": setup,
        "training_time_s": round(elapsed, 6),
        "artificial_delay_s": round(added_delay, 6),
        "effective_time_s": round(elapsed + added_delay, 6),
        "final_accuracy": round(float(history["accuracy"][-1]), 6),
        "final_loss": round(float(history["loss"][-1]), 6),
    }
    return result

Example Usage

from hardware_simulation import HardwareSimulationConfig, run_training_with_hardware_constraints
from student import NeuralNetwork

# Create a constrained configuration
config = HardwareSimulationConfig(
    enabled=True,
    max_memory_mb=1.0,  # Very tight: 1 MB limit
    compute_speed_factor=2.0,  # Simulate 2x slower CPU
    precision_mode="float16",  # Use half precision
    batch_size_limit=64
)

model = NeuralNetwork(layer_sizes=[784, 64, 10], activations=["relu", "softmax"])

result = run_training_with_hardware_constraints(
    model=model,
    X=X_train,
    y=y_train,
    epochs=5,
    alpha=0.1,
    batch_size=32,  # Will be adjusted down if needed
    seed=42,
    simulation_config=config
)

print(f"Adjusted batch size: {result['setup']['batch_size']}")
print(f"Actual training time: {result['training_time_s']:.2f}s")
print(f"Simulated time: {result['effective_time_s']:.2f}s")
print(f"Warnings: {result['setup']['warnings']}")

Design Motivations

  • Accessibility: Enables constraint experiments without specialized hardware
  • Reproducibility: Software-based limits are deterministic and portable
  • Comparative studies: Easy to sweep parameters and compare trade-offs
  • Cost: No need for edge devices, low-power boards, or mobile hardware
  • No cache modeling: Real hardware has complex cache hierarchies
  • No instruction-level effects: SIMD, vectorization, and compiler optimizations aren’t modeled
  • Linear slowdown model: Real compute differences are non-linear
  • No power/thermal modeling: Energy estimates are coarse (see benchmarking)
  • Conservative memory: Doesn’t account for Python overhead or gradient storage
  • Exploring batch size vs. accuracy trade-offs
  • Testing architecture changes under memory budgets
  • Comparing precision modes (float32 vs. float16 vs. int8)
  • Prototyping edge deployment scenarios
  • Educational demonstrations of resource constraints

Next Steps

Precision Modes

Deep dive into float32, float16, and int8 implementations

Reproducibility

Ensuring deterministic results across runs

Build docs developers (and LLMs) love