Precision configuration is critical for balancing resource utilization and model accuracy in FPGA implementations. hls4ml provides both automatic and manual precision tuning capabilities.
Precision Types
hls4ml supports multiple precision types for fixed-point arithmetic:
FixedPrecisionType
Standard fixed-point representation:
from hls4ml.model.types import FixedPrecisionType
# ap_fixed<16,6> - 16 total bits, 6 integer bits (including sign)
precision = FixedPrecisionType( width = 16 , integer = 6 , signed = True )
# Equivalent to: 10 fractional bits, 1 sign bit, 5 integer bits
# Range: [-32, 31.999...]
# Resolution: 2^-10 ≈ 0.00098
IntegerPrecisionType
Integer-only (no fractional bits):
from hls4ml.model.types import IntegerPrecisionType
# ap_int<8> or ap_uint<8>
signed_int = IntegerPrecisionType( width = 8 , signed = True ) # -128 to 127
unsigned_int = IntegerPrecisionType( width = 8 , signed = False ) # 0 to 255
ExponentPrecisionType
Power-of-2 representation (for po2 quantization):
from hls4ml.model.types import ExponentPrecisionType
# Values are 2^n
po2_precision = ExponentPrecisionType( width = 8 , signed = True )
XnorPrecisionType
Binary representation for XNOR operations:
from hls4ml.model.types import XnorPrecisionType
# Single-bit binary {0, 1} for XNOR networks
xnor_precision = XnorPrecisionType()
Automatic Precision Inference
The InferPrecisionTypes optimizer pass automatically calculates appropriate precision:
import hls4ml
config = hls4ml.utils.config_from_keras_model(
model,
granularity = 'name' ,
default_precision = 'ap_fixed<16,6>'
)
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config = config,
output_dir = 'auto_precision'
)
# Precision is automatically inferred for intermediate layers
How Inference Works
Analyze input precision
Start with user-defined or quantizer-specified input precision.
Propagate through operations
Calculate required precision for each operation based on:
Input bit-widths
Weight bit-widths
Operation type (multiplication, addition, etc.)
Number of accumulations
Apply maximum precision limits
If maximum precision is specified in config, constrain inferred precision.
Avoid overflow and underflow
Ensure sufficient integer bits to prevent overflow and fractional bits to maintain resolution.
Precision Inference Example
For a Dense layer with:
Input: ap_fixed<8,3> (8 bits, 3 integer)
Weights: ap_fixed<8,3>
Bias: ap_fixed<8,3>
128 neurons (n_in = 128)
Inferred accumulator precision:
# Bitwidth = input_width + weight_width + ceil(log2(n_ops))
bitwidth = 8 + 8 + ceil(log2( 128 )) = 8 + 8 + 7 = 23
# Integer = input_int + weight_int + ceil(log2(n_ops))
integer = 3 + 3 + 7 = 13
# Result: ap_fixed<23,13> for accumulator
Maximum Precision Configuration
Limit inferred precision to control resource usage:
config = hls4ml.utils.config_from_keras_model(
model,
default_precision = 'ap_fixed<16,6>'
)
# Set maximum precision
config[ 'Model' ][ 'Precision' ] = {
'default' : 'ap_fixed<16,6>' ,
'maximum' : 'ap_fixed<32,16>' # Cap all inferred precision
}
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config = config,
output_dir = 'max_precision'
)
Maximum precision limiting can cause overflow if set too aggressively. Always verify with C simulation.
Manual Precision Configuration
Override automatic inference with explicit precision:
Layer-Level Precision
config = hls4ml.utils.config_from_keras_model(
model,
granularity = 'name'
)
# Set precision for specific layer
config[ 'LayerName' ][ 'fc1' ] = {
'Precision' : {
'weight' : 'ap_fixed<8,3>' ,
'bias' : 'ap_fixed<8,3>' ,
'result' : 'ap_fixed<16,6>' ,
'accum' : 'ap_fixed<24,12>'
}
}
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config = config,
output_dir = 'manual_precision'
)
Type-Level Precision
# Set precision by layer type
config[ 'LayerType' ][ 'Dense' ] = {
'Precision' : {
'weight' : 'ap_fixed<6,2>' ,
'bias' : 'ap_fixed<6,2>' ,
'result' : 'ap_fixed<12,4>'
}
}
Rounding and Saturation Modes
Control how values are rounded and saturated:
from hls4ml.model.types import FixedPrecisionType, RoundingMode, SaturationMode
precision = FixedPrecisionType(
width = 16 ,
integer = 6 ,
signed = True ,
rounding_mode = RoundingMode. TRN , # Truncate (default)
saturation_mode = SaturationMode. WRAP , # Wrap around (default)
saturation_bits = 0
)
Rounding Modes
TRN (Truncate)
RND (Round)
RND_ZERO
RND_INF
RND_CONV
rounding_mode = RoundingMode. TRN
Fastest. Simply drops fractional bits. Can introduce negative bias. rounding_mode = RoundingMode. RND
Round to nearest. Adds 0.5 and truncates. More accurate than TRN. rounding_mode = RoundingMode. RND_ZERO
Round towards zero. rounding_mode = RoundingMode. RND_INF
Round towards infinity. rounding_mode = RoundingMode. RND_CONV
Convergent rounding (banker’s rounding).
Saturation Modes
WRAP
SAT
SAT_ZERO
SAT_SYM
saturation_mode = SaturationMode. WRAP
Wrap around on overflow (fastest, default). Can cause severe errors. saturation_mode = SaturationMode. SAT
Saturate to maximum/minimum value. Safer but uses more resources. saturation_mode = SaturationMode. SAT_ZERO
Saturate to zero on overflow. saturation_mode = SaturationMode. SAT_SYM
Symmetric saturation. Minimum is -(max-1) instead of -max.
Bit-Exact Precision Inference
For properly quantized models (QKeras, HGQ), use bit-exact inference:
config = hls4ml.utils.config_from_keras_model(
qkeras_model,
granularity = 'name'
)
# Enable bit-exact inference (automatic for QKeras)
config[ 'HLSConfig' ][ 'Model' ][ 'Precision' ] = {
'bit_exact' : True
}
hls_model = hls4ml.converters.convert_from_keras_model(
qkeras_model,
hls_config = config,
output_dir = 'bit_exact'
)
Bit-exact inference is automatically enabled for QKeras and HGQ models. It ignores user-defined precision and trusts the quantizers.
Requirements for Bit-Exact
Quantizers between all layers with non-trivial operations
Input quantization explicitly defined (QActivation as first layer)
All operations supported by bit-exact pass
Bit-exact inference will crash if it encounters unsupported operations or missing quantizers. Use automatic inference instead for unquantized models.
Precision Profiling
Use profiling to guide precision choices:
from hls4ml.model.profiling import numerical
import matplotlib.pyplot as plt
# Profile with test data
wp, wph, ap, aph = numerical( model = model, hls_model = hls_model, X = X_test)
# Grey boxes show current precision ranges
plt.show()
Interpret profiling results:
Box-and-whisker shows weight value distribution
Grey box shows representable range with current precision
If whiskers extend beyond grey box: increase precision
If grey box much larger than whiskers: can reduce precision
Shows layer output value distributions
More critical than weights (errors propagate)
Ensure grey boxes fully contain whiskers
Consider headroom for variations in test data
Advanced Precision Techniques
Heterogeneous Precision
Use different precision for different parts of the network:
config = hls4ml.utils.config_from_keras_model(model, granularity = 'name' )
# Early layers: higher precision (processing raw inputs)
config[ 'LayerName' ][ 'conv1' ][ 'Precision' ] = {
'result' : 'ap_fixed<16,6>'
}
# Middle layers: medium precision
config[ 'LayerName' ][ 'conv2' ][ 'Precision' ] = {
'result' : 'ap_fixed<12,4>'
}
# Late layers: lower precision (features already extracted)
config[ 'LayerName' ][ 'fc1' ][ 'Precision' ] = {
'result' : 'ap_fixed<8,3>'
}
hls_model = hls4ml.converters.convert_from_keras_model(
model, hls_config = config, output_dir = 'heterogeneous'
)
Accumulator Precision Tuning
Carefully control accumulator precision to balance accuracy and resources:
# Dense layer with 512 inputs
config[ 'LayerName' ][ 'fc_large' ] = {
'Precision' : {
'weight' : 'ap_fixed<8,3>' ,
'bias' : 'ap_fixed<8,3>' ,
# Large accumulator for many additions
'accum' : 'ap_fixed<32,16>' ,
# Smaller result after activation
'result' : 'ap_fixed<16,6>'
}
}
Dynamic Precision Selection
Automatically tune precision based on profiling:
from hls4ml.model.profiling import weights_hlsmodel, types_hlsmodel
import numpy as np
def calculate_required_precision ( data ):
"""Calculate minimum precision from data distribution"""
max_val = np.max(np.abs(data))
integer_bits = int (np.ceil(np.log2(max_val + 1 ))) + 1 # +1 for sign
# Target resolution: 1/256 of max value
fractional_bits = int (np.ceil(np.log2( 256 )))
total_bits = integer_bits + fractional_bits
return f 'ap_fixed< { total_bits } , { integer_bits } >'
# Get weight statistics
weight_data = weights_hlsmodel(hls_model)
# Analyze and update precision
for layer_name, weights in weight_data.items():
required = calculate_required_precision(weights)
config[ 'LayerName' ][layer_name][ 'Precision' ][ 'weight' ] = required
Precision for Different Backends
Vivado/Vitis
# Standard ap_fixed notation
config[ 'Model' ][ 'Precision' ] = {
'default' : 'ap_fixed<16,6>' ,
'maximum' : 'ap_fixed<32,16>'
}
Quartus
# Uses ac_fixed (same semantics)
config[ 'Model' ][ 'Precision' ] = {
'default' : 'ac_fixed<16,6,true>' , # width, integer, signed
}
Catapult
# ac_fixed format
config[ 'Model' ][ 'Precision' ] = {
'default' : 'ac_fixed<16,6,true>'
}
Best Practices
Always profile your model with representative test data before manually tuning precision. This provides data-driven guidance.
Use automatic inference first
Let hls4ml infer precision automatically, then selectively override problem layers identified through C simulation.
Prioritize activation precision
Activation precision is more critical than weight precision since errors propagate. Ensure activations don’t overflow.
After changing precision, always run C simulation with diverse test data to catch overflow/underflow issues.
Consider accumulator precision
Layers with many accumulations (large Dense, Conv) need higher accumulator precision to avoid overflow.
Use saturation in critical paths
For layers where overflow could be catastrophic, use SAT mode despite the resource cost.
Document precision choices
Keep notes on why specific precision was chosen for each layer. This helps future debugging and tuning.
Troubleshooting
C simulation results differ from Keras
Check for overflow: increase integer bits
Check for underflow: increase fractional bits
Profile activations to see actual value ranges
Try increasing precision incrementally
High resource utilization
Review profiling: are you using more precision than needed?
Set maximum precision to cap inferred types
Use heterogeneous precision: reduce precision in less critical layers
Consider lower precision for middle layers
Overflow warnings in synthesis
Increase integer bits in affected layers
Use SAT saturation mode instead of WRAP
Review accumulator precision for layers with many operations
Bit-exact inference fails
Ensure quantizers between all layers
Add QActivation as first layer for input quantization
Check that all operations are supported
Fall back to automatic inference if needed
API Reference
FixedPrecisionType
hls4ml.model.types.FixedPrecisionType(
width, # Total bits
integer, # Integer bits (including sign)
signed = True ,
rounding_mode = RoundingMode. TRN ,
saturation_mode = SaturationMode. WRAP ,
saturation_bits = 0
)
IntegerPrecisionType
hls4ml.model.types.IntegerPrecisionType(
width, # Total bits
signed = True
)
InferPrecisionTypes Pass
from hls4ml.model.optimizer.passes.infer_precision import InferPrecisionTypes
# Automatically applied during conversion
# Can be configured:
pass_config = {
'infer_no_bias' : False # Assume zero bias for tighter bounds
}