Precision Tuning

Precision configuration is critical for balancing resource utilization and model accuracy in FPGA implementations. hls4ml provides both automatic and manual precision tuning capabilities.

Precision Types

hls4ml supports multiple precision types for fixed-point arithmetic:

FixedPrecisionType

Standard fixed-point representation:

from hls4ml.model.types import FixedPrecisionType

# ap_fixed<16,6> - 16 total bits, 6 integer bits (including sign)
precision = FixedPrecisionType(width=16, integer=6, signed=True)

# Equivalent to: 10 fractional bits, 1 sign bit, 5 integer bits
# Range: [-32, 31.999...]
# Resolution: 2^-10 ≈ 0.00098

IntegerPrecisionType

Integer-only (no fractional bits):

from hls4ml.model.types import IntegerPrecisionType

# ap_int<8> or ap_uint<8>
signed_int = IntegerPrecisionType(width=8, signed=True)    # -128 to 127
unsigned_int = IntegerPrecisionType(width=8, signed=False)  # 0 to 255

ExponentPrecisionType

Power-of-2 representation (for po2 quantization):

from hls4ml.model.types import ExponentPrecisionType

# Values are 2^n
po2_precision = ExponentPrecisionType(width=8, signed=True)

XnorPrecisionType

Binary representation for XNOR operations:

from hls4ml.model.types import XnorPrecisionType

# Single-bit binary {0, 1} for XNOR networks
xnor_precision = XnorPrecisionType()

Automatic Precision Inference

The InferPrecisionTypes optimizer pass automatically calculates appropriate precision:

import hls4ml

config = hls4ml.utils.config_from_keras_model(
    model,
    granularity='name',
    default_precision='ap_fixed<16,6>'
)

hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    output_dir='auto_precision'
)

# Precision is automatically inferred for intermediate layers

How Inference Works

Analyze input precision

Start with user-defined or quantizer-specified input precision.

Propagate through operations

Calculate required precision for each operation based on:

Input bit-widths
Weight bit-widths
Operation type (multiplication, addition, etc.)
Number of accumulations

Apply maximum precision limits

If maximum precision is specified in config, constrain inferred precision.

Avoid overflow and underflow

Ensure sufficient integer bits to prevent overflow and fractional bits to maintain resolution.

Precision Inference Example

For a Dense layer with:

Input: ap_fixed<8,3> (8 bits, 3 integer)
Weights: ap_fixed<8,3>
Bias: ap_fixed<8,3>
128 neurons (n_in = 128)

Inferred accumulator precision:

# Bitwidth = input_width + weight_width + ceil(log2(n_ops))
bitwidth = 8 + 8 + ceil(log2(128)) = 8 + 8 + 7 = 23

# Integer = input_int + weight_int + ceil(log2(n_ops))
integer = 3 + 3 + 7 = 13

# Result: ap_fixed<23,13> for accumulator

Maximum Precision Configuration

Limit inferred precision to control resource usage:

config = hls4ml.utils.config_from_keras_model(
    model,
    default_precision='ap_fixed<16,6>'
)

# Set maximum precision
config['Model']['Precision'] = {
    'default': 'ap_fixed<16,6>',
    'maximum': 'ap_fixed<32,16>'  # Cap all inferred precision
}

hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    output_dir='max_precision'
)

Maximum precision limiting can cause overflow if set too aggressively. Always verify with C simulation.

Manual Precision Configuration

Override automatic inference with explicit precision:

Layer-Level Precision

config = hls4ml.utils.config_from_keras_model(
    model,
    granularity='name'
)

# Set precision for specific layer
config['LayerName']['fc1'] = {
    'Precision': {
        'weight': 'ap_fixed<8,3>',
        'bias': 'ap_fixed<8,3>',
        'result': 'ap_fixed<16,6>',
        'accum': 'ap_fixed<24,12>'
    }
}

hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    output_dir='manual_precision'
)

Type-Level Precision

# Set precision by layer type
config['LayerType']['Dense'] = {
    'Precision': {
        'weight': 'ap_fixed<6,2>',
        'bias': 'ap_fixed<6,2>',
        'result': 'ap_fixed<12,4>'
    }
}

Rounding and Saturation Modes

Control how values are rounded and saturated:

from hls4ml.model.types import FixedPrecisionType, RoundingMode, SaturationMode

precision = FixedPrecisionType(
    width=16,
    integer=6,
    signed=True,
    rounding_mode=RoundingMode.TRN,        # Truncate (default)
    saturation_mode=SaturationMode.WRAP,   # Wrap around (default)
    saturation_bits=0
)

Rounding Modes

TRN (Truncate)
RND (Round)
RND_ZERO
RND_INF
RND_CONV

rounding_mode=RoundingMode.TRN

Fastest. Simply drops fractional bits. Can introduce negative bias.

rounding_mode=RoundingMode.RND

Round to nearest. Adds 0.5 and truncates. More accurate than TRN.

rounding_mode=RoundingMode.RND_ZERO

Round towards zero.

rounding_mode=RoundingMode.RND_INF

Round towards infinity.

rounding_mode=RoundingMode.RND_CONV

Convergent rounding (banker’s rounding).

Saturation Modes

WRAP
SAT
SAT_ZERO
SAT_SYM

saturation_mode=SaturationMode.WRAP

Wrap around on overflow (fastest, default). Can cause severe errors.

saturation_mode=SaturationMode.SAT

Saturate to maximum/minimum value. Safer but uses more resources.

saturation_mode=SaturationMode.SAT_ZERO

Saturate to zero on overflow.

saturation_mode=SaturationMode.SAT_SYM

Symmetric saturation. Minimum is -(max-1) instead of -max.

Bit-Exact Precision Inference

For properly quantized models (QKeras, HGQ), use bit-exact inference:

config = hls4ml.utils.config_from_keras_model(
    qkeras_model,
    granularity='name'
)

# Enable bit-exact inference (automatic for QKeras)
config['HLSConfig']['Model']['Precision'] = {
    'bit_exact': True
}

hls_model = hls4ml.converters.convert_from_keras_model(
    qkeras_model,
    hls_config=config,
    output_dir='bit_exact'
)

Bit-exact inference is automatically enabled for QKeras and HGQ models. It ignores user-defined precision and trusts the quantizers.

Requirements for Bit-Exact

Quantizers between all layers with non-trivial operations

Input quantization explicitly defined (QActivation as first layer)

All operations supported by bit-exact pass

Bit-exact inference will crash if it encounters unsupported operations or missing quantizers. Use automatic inference instead for unquantized models.

Precision Profiling

Use profiling to guide precision choices:

from hls4ml.model.profiling import numerical
import matplotlib.pyplot as plt

# Profile with test data
wp, wph, ap, aph = numerical(model=model, hls_model=hls_model, X=X_test)

# Grey boxes show current precision ranges
plt.show()

Interpret profiling results:

Weights
Activations

Box-and-whisker shows weight value distribution
Grey box shows representable range with current precision
If whiskers extend beyond grey box: increase precision
If grey box much larger than whiskers: can reduce precision

Advanced Precision Techniques

Heterogeneous Precision

Use different precision for different parts of the network:

config = hls4ml.utils.config_from_keras_model(model, granularity='name')

# Early layers: higher precision (processing raw inputs)
config['LayerName']['conv1']['Precision'] = {
    'result': 'ap_fixed<16,6>'
}

# Middle layers: medium precision
config['LayerName']['conv2']['Precision'] = {
    'result': 'ap_fixed<12,4>'
}

# Late layers: lower precision (features already extracted)
config['LayerName']['fc1']['Precision'] = {
    'result': 'ap_fixed<8,3>'
}

hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config, output_dir='heterogeneous'
)

Accumulator Precision Tuning

Carefully control accumulator precision to balance accuracy and resources:

# Dense layer with 512 inputs
config['LayerName']['fc_large'] = {
    'Precision': {
        'weight': 'ap_fixed<8,3>',
        'bias': 'ap_fixed<8,3>',
        # Large accumulator for many additions
        'accum': 'ap_fixed<32,16>',
        # Smaller result after activation
        'result': 'ap_fixed<16,6>'
    }
}

Dynamic Precision Selection

Automatically tune precision based on profiling:

from hls4ml.model.profiling import weights_hlsmodel, types_hlsmodel
import numpy as np

def calculate_required_precision(data):
    """Calculate minimum precision from data distribution"""
    max_val = np.max(np.abs(data))
    integer_bits = int(np.ceil(np.log2(max_val + 1))) + 1  # +1 for sign
    
    # Target resolution: 1/256 of max value
    fractional_bits = int(np.ceil(np.log2(256)))
    
    total_bits = integer_bits + fractional_bits
    return f'ap_fixed<{total_bits},{integer_bits}>'

# Get weight statistics
weight_data = weights_hlsmodel(hls_model)

# Analyze and update precision
for layer_name, weights in weight_data.items():
    required = calculate_required_precision(weights)
    config['LayerName'][layer_name]['Precision']['weight'] = required

Precision for Different Backends

Vivado/Vitis

# Standard ap_fixed notation
config['Model']['Precision'] = {
    'default': 'ap_fixed<16,6>',
    'maximum': 'ap_fixed<32,16>'
}

Quartus

# Uses ac_fixed (same semantics)
config['Model']['Precision'] = {
    'default': 'ac_fixed<16,6,true>',  # width, integer, signed
}

Catapult

# ac_fixed format
config['Model']['Precision'] = {
    'default': 'ac_fixed<16,6,true>'
}

Best Practices

Start with profiling

Always profile your model with representative test data before manually tuning precision. This provides data-driven guidance.

Use automatic inference first

Let hls4ml infer precision automatically, then selectively override problem layers identified through C simulation.

Prioritize activation precision

Activation precision is more critical than weight precision since errors propagate. Ensure activations don’t overflow.

Test with C simulation

After changing precision, always run C simulation with diverse test data to catch overflow/underflow issues.

Consider accumulator precision

Layers with many accumulations (large Dense, Conv) need higher accumulator precision to avoid overflow.

Use saturation in critical paths

For layers where overflow could be catastrophic, use SAT mode despite the resource cost.

Document precision choices

Keep notes on why specific precision was chosen for each layer. This helps future debugging and tuning.

Troubleshooting

C simulation results differ from Keras

Check for overflow: increase integer bits
Check for underflow: increase fractional bits
Profile activations to see actual value ranges
Try increasing precision incrementally

High resource utilization

Review profiling: are you using more precision than needed?
Set maximum precision to cap inferred types
Use heterogeneous precision: reduce precision in less critical layers
Consider lower precision for middle layers

Overflow warnings in synthesis

Increase integer bits in affected layers
Use SAT saturation mode instead of WRAP
Review accumulator precision for layers with many operations

Bit-exact inference fails

Ensure quantizers between all layers
Add QActivation as first layer for input quantization
Check that all operations are supported
Fall back to automatic inference if needed

API Reference

FixedPrecisionType

hls4ml.model.types.FixedPrecisionType(
    width,          # Total bits
    integer,        # Integer bits (including sign)
    signed=True,
    rounding_mode=RoundingMode.TRN,
    saturation_mode=SaturationMode.WRAP,
    saturation_bits=0
)

IntegerPrecisionType

hls4ml.model.types.IntegerPrecisionType(
    width,          # Total bits
    signed=True
)

InferPrecisionTypes Pass

from hls4ml.model.optimizer.passes.infer_precision import InferPrecisionTypes

# Automatically applied during conversion
# Can be configured:
pass_config = {
    'infer_no_bias': False  # Assume zero bias for tighter bounds
}

Getting Started

Core Concepts

Frontends

Backends

Advanced Features

Internals

Precision Tuning

Precision Types

FixedPrecisionType

IntegerPrecisionType

ExponentPrecisionType

XnorPrecisionType

Automatic Precision Inference

How Inference Works

Precision Inference Example

Maximum Precision Configuration

Manual Precision Configuration

Layer-Level Precision

Type-Level Precision

Rounding and Saturation Modes

Rounding Modes

Saturation Modes

Bit-Exact Precision Inference

Requirements for Bit-Exact

Precision Profiling

Advanced Precision Techniques

Heterogeneous Precision

Accumulator Precision Tuning

Dynamic Precision Selection

Precision for Different Backends

Vivado/Vitis

Quartus

Catapult

Best Practices

Troubleshooting

API Reference

FixedPrecisionType

IntegerPrecisionType

InferPrecisionTypes Pass

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Frontends

Backends

Advanced Features

Internals

​Precision Types

​FixedPrecisionType

​IntegerPrecisionType

​ExponentPrecisionType

​XnorPrecisionType

​Automatic Precision Inference

​How Inference Works

​Precision Inference Example

​Maximum Precision Configuration

​Manual Precision Configuration

​Layer-Level Precision

​Type-Level Precision

​Rounding and Saturation Modes

​Rounding Modes

​Saturation Modes

​Bit-Exact Precision Inference

​Requirements for Bit-Exact

​Precision Profiling

​Advanced Precision Techniques

​Heterogeneous Precision

​Accumulator Precision Tuning

​Dynamic Precision Selection

​Precision for Different Backends

​Vivado/Vitis

​Quartus

​Catapult

​Best Practices

​Troubleshooting

​API Reference

​FixedPrecisionType

​IntegerPrecisionType

​InferPrecisionTypes Pass

Build docs developers (and LLMs) love

Precision Types

FixedPrecisionType

IntegerPrecisionType

ExponentPrecisionType

XnorPrecisionType

Automatic Precision Inference

How Inference Works

Precision Inference Example

Maximum Precision Configuration

Manual Precision Configuration

Layer-Level Precision

Type-Level Precision

Rounding and Saturation Modes

Rounding Modes

Saturation Modes

Bit-Exact Precision Inference

Requirements for Bit-Exact

Precision Profiling

Advanced Precision Techniques

Heterogeneous Precision

Accumulator Precision Tuning

Dynamic Precision Selection

Precision for Different Backends

Vivado/Vitis

Quartus

Catapult

Best Practices

Troubleshooting

API Reference

FixedPrecisionType

IntegerPrecisionType

InferPrecisionTypes Pass