Skip to main content

Why Fixed-Point?

FPGAs excel at custom fixed-point arithmetic, offering significant advantages over floating-point:
  • Resource Efficiency - Fixed-point operations use fewer LUTs and DSPs
  • Power Efficiency - Lower power consumption than floating-point
  • Performance - Higher throughput and lower latency
  • Predictable Behavior - Deterministic rounding and overflow
A 32-bit floating-point multiplier uses ~5x more resources than a 16-bit fixed-point multiplier on FPGAs.

Fixed-Point Representation

Format Specification

Fixed-point numbers are specified as fixed<W,I> or ap_fixed<W,I>:
  • W (Width) - Total number of bits
  • I (Integer) - Number of integer bits (left of decimal point)
  • F (Fractional) - Number of fractional bits = W - I
# Examples
'fixed<16,6>'   # 16 bits total, 6 integer, 10 fractional
'fixed<8,3>'    # 8 bits total, 3 integer, 5 fractional
'fixed<32,16>'  # 16 bits total, 16 integer, 16 fractional

Representation Range

For signed fixed-point fixed<W,I>:
  • Minimum value: -2^(I-1)
  • Maximum value: 2^(I-1) - 2^(-F)
  • Resolution: 2^(-F)
# fixed<16,6> signed
min_value = -2^5 = -32
max_value = 2^5 - 2^-1031.999
resolution = 2^-100.000977

# fixed<8,3> signed  
min_value = -2^2 = -4
max_value = 2^2 - 2^-53.969
resolution = 2^-5 = 0.03125

Unsigned Types

Use ufixed for unsigned (non-negative) values:
'ufixed<16,8>'    # 0 to 255.996
'ap_ufixed<8,4>'  # 0 to 15.937
For unsigned ufixed<W,I>:
  • Minimum value: 0
  • Maximum value: 2^I - 2^(-F)
Type definitions: hls4ml/model/types.py:87

Precision Configuration

Model-Level Precision

Set default precision for all layers:
import hls4ml

config = hls4ml.utils.config_from_keras_model(
    model,
    default_precision='fixed<16,6>'  # Default for all layers
)

Layer-Type Precision

Set precision by layer type:
config = {
    'Model': {
        'Precision': 'fixed<16,6>',
        'ReuseFactor': 1
    },
    'LayerType': {
        'Dense': {
            'Precision': 'fixed<18,8>'  # More precision for Dense layers
        },
        'Activation': {
            'Precision': 'fixed<16,6>'  # Standard precision for activations
        },
        'BatchNormalization': {
            'Precision': 'fixed<16,8>'  # Higher integer bits for BN
        }
    }
}

Layer-Specific Precision

Fine-tune individual layers:
config = {
    'Model': {
        'Precision': 'fixed<16,6>',
        'ReuseFactor': 1
    },
    'LayerName': {
        'dense_1': {
            'Precision': 'fixed<32,16>'  # Critical layer needs high precision
        },
        'dense_2': {
            'Precision': 'fixed<12,4>'   # Less critical, save resources
        },
        'activation_1': {
            'Precision': 'fixed<16,6>'
        }
    }
}
Configuration hierarchy: hls4ml/model/graph.py:127

Tensor-Level Precision

Control precision for specific tensors within a layer:
config['LayerName']['dense_1'] = {
    'Precision': {
        'weight': 'fixed<8,4>',      # 8-bit weights
        'bias': 'fixed<16,8>',       # 16-bit biases  
        'result': 'fixed<16,6>',     # 16-bit outputs
        'accum': 'fixed<24,12>'      # 24-bit accumulator
    }
}

Advanced Precision Types

Rounding Modes

Control how values are rounded when precision is reduced:
from hls4ml.model.types import RoundingMode

# Available rounding modes:
# - TRN: Truncate (default)
# - RND: Round to nearest
# - RND_ZERO: Round to nearest, ties to zero
# - RND_INF: Round to nearest, ties to infinity
# - RND_MIN_INF: Round to nearest, ties to -infinity
# - RND_CONV: Convergent rounding

config['LayerName']['dense_1'] = {
    'Precision': 'fixed<16,6,RND>'  # Round instead of truncate
}
Rounding modes: hls4ml/model/types.py:50

Saturation Modes

Control overflow behavior:
from hls4ml.model.types import SaturationMode

# Available saturation modes:
# - WRAP: Wrap around (default)
# - SAT: Saturate at min/max
# - SAT_ZERO: Saturate to zero
# - SAT_SYM: Symmetric saturation

config['LayerName']['dense_1'] = {
    'Precision': 'fixed<16,6,RND,SAT>'  # Round and saturate
}
Saturation modes: hls4ml/model/types.py:70

Full Precision Specification

# Format: fixed<W, I, rounding, saturation, saturation_bits>
config['LayerName']['dense_1'] = {
    'Precision': 'ap_fixed<16,6,AP_RND,AP_SAT_SYM,1>'
}

Precision Tuning Strategies

Strategy 1: Start Conservative

Begin with high precision, then reduce:
# Step 1: Start with high precision
config_high = hls4ml.utils.config_from_keras_model(
    model,
    default_precision='fixed<32,16>'
)

# Step 2: Verify accuracy
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config_high
)
hls_model.compile()
y_hls = hls_model.predict(X_test)
accuracy_high = np.mean(np.abs(y_keras - y_hls))

# Step 3: Reduce precision incrementally
config_medium = hls4ml.utils.config_from_keras_model(
    model,
    default_precision='fixed<16,8>'
)
# ... test again ...

config_low = hls4ml.utils.config_from_keras_model(
    model,
    default_precision='fixed<12,4>'
)
# ... test again ...

Strategy 2: Profiling-Based

Use built-in profiling to identify precision needs:
import hls4ml
from hls4ml.model import profiling

# Create model with initial precision
config = hls4ml.utils.config_from_keras_model(
    model,
    default_precision='fixed<16,6>'
)
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config
)
hls_model.compile()

# Profile the model
X_test = np.random.rand(1000, input_dim)
profiling.numerical(keras_model, hls_model, X_test)

# Analyze output to identify layers needing more precision

Strategy 3: Layer-by-Layer

Optimize precision layer by layer:
def find_optimal_precision(model, layer_name, X_test, y_reference):
    """Find optimal precision for a specific layer"""
    precisions = ['fixed<8,3>', 'fixed<12,4>', 'fixed<16,6>', 'fixed<24,8>']
    
    results = {}
    for prec in precisions:
        config = hls4ml.utils.config_from_keras_model(model)
        config['LayerName'][layer_name] = {'Precision': prec}
        
        hls_model = hls4ml.converters.convert_from_keras_model(
            model, hls_config=config
        )
        hls_model.compile()
        
        y_pred = hls_model.predict(X_test)
        mse = np.mean((y_reference - y_pred) ** 2)
        
        results[prec] = mse
    
    return min(results.items(), key=lambda x: x[1])

Strategy 4: Automatic Precision (AutoQKeras)

For QKeras models, precision is automatically inferred:
import qkeras

# QKeras model with quantized layers
model = tf.keras.Sequential([
    qkeras.QDense(64, 
                  kernel_quantizer='quantized_bits(8,0,alpha=1)',
                  bias_quantizer='quantized_bits(8,0,alpha=1)',
                  input_shape=(16,)),
    qkeras.QActivation('quantized_relu(8,0)'),
    qkeras.QDense(10,
                  kernel_quantizer='quantized_bits(8,0,alpha=1)',
                  bias_quantizer='quantized_bits(8,0,alpha=1)')
])

# Precision is automatically extracted from quantizers
config = hls4ml.utils.config_from_keras_model(model)
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config
)

Common Precision Patterns

Pattern 1: Progressive Widening

Increase precision through the network:
config = {
    'Model': {'Precision': 'fixed<16,6>', 'ReuseFactor': 1},
    'LayerName': {
        'dense_1': {'Precision': 'fixed<16,6>'},   # Input layer
        'dense_2': {'Precision': 'fixed<18,7>'},   # Middle layer
        'dense_3': {'Precision': 'fixed<20,8>'},   # Output layer
    }
}

Pattern 2: Critical Path High Precision

High precision where it matters:
config = {
    'Model': {'Precision': 'fixed<12,4>', 'ReuseFactor': 1},  # Low default
    'LayerName': {
        'attention_layer': {'Precision': 'fixed<32,16>'},  # Critical
        'dense_1': {'Precision': 'fixed<12,4>'},           # Standard
        'dense_2': {'Precision': 'fixed<12,4>'},           # Standard
    }
}

Pattern 3: Activation-Specific

Different precision for different activation types:
config = {
    'Model': {'Precision': 'fixed<16,6>', 'ReuseFactor': 1},
    'LayerType': {
        'Activation': {'Precision': 'fixed<16,6>'},
        'Softmax': {'Precision': 'fixed<18,8>'},     # Softmax needs more precision
        'TanH': {'Precision': 'fixed<18,8>'},        # TanH needs more integer bits
    }
}

Pattern 4: Weight vs. Activation

Different precision for weights and activations:
config = {
    'Model': {'Precision': 'fixed<16,6>', 'ReuseFactor': 1},
    'LayerName': {
        'dense_1': {
            'Precision': {
                'weight': 'fixed<8,4>',    # Low precision weights
                'bias': 'fixed<16,8>',     # Higher precision bias
                'result': 'fixed<16,6>',   # Standard output
                'accum': 'fixed<24,12>'    # Wide accumulator
            }
        }
    }
}

Precision and Resource Usage

DSP Block Usage

DSP blocks on FPGAs typically support:
  • Xilinx DSP48E2: Up to 27×18-bit multiplication
  • Intel DSP: Up to 27×27-bit multiplication
# Efficient: Fits in one DSP block
config['LayerName']['dense_1'] = {
    'Precision': {
        'weight': 'fixed<18,9>',   # 18-bit multiplier input
        'result': 'fixed<16,6>'
    }
}

# Inefficient: Requires multiple DSP blocks
config['LayerName']['dense_2'] = {
    'Precision': {
        'weight': 'fixed<32,16>',  # 32-bit multiplier
        'result': 'fixed<32,16>'
    }
}

Memory Bandwidth

Lower precision reduces memory bandwidth:
# High bandwidth: 32 bits per weight
config_high_bw = {'Model': {'Precision': 'fixed<32,16>'}}

# Medium bandwidth: 16 bits per weight (50% reduction)
config_med_bw = {'Model': {'Precision': 'fixed<16,6>'}}

# Low bandwidth: 8 bits per weight (75% reduction)
config_low_bw = {'Model': {'Precision': 'fixed<8,3>'}}

Latency Impact

Precision affects pipeline depth:
# Lower precision → fewer pipeline stages → lower latency
config_fast = {'Model': {'Precision': 'fixed<12,4>'}}

# Higher precision → more pipeline stages → higher latency
config_accurate = {'Model': {'Precision': 'fixed<32,16>'}}

Quantization-Aware Training

QKeras Integration

hls4ml works seamlessly with QKeras for quantization-aware training:
import qkeras
import tensorflow as tf

# Define quantized model
model = tf.keras.Sequential([
    qkeras.QDense(
        64,
        kernel_quantizer='quantized_bits(6,0,alpha=1)',
        bias_quantizer='quantized_bits(6,0,alpha=1)',
        input_shape=(16,)
    ),
    qkeras.QActivation('quantized_relu(6,0)'),
    qkeras.QDense(
        10,
        kernel_quantizer='quantized_bits(6,0,alpha=1)',
        bias_quantizer='quantized_bits(6,0,alpha=1)'
    )
])

# Train with quantization
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(X_train, y_train, epochs=10)

# Convert - precision extracted automatically
config = hls4ml.utils.config_from_keras_model(model)
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config
)

HGQ (HLS4ML Gradient Quantization)

Advanced quantization with bit-width optimization:
import HGQ

# HGQ provides automatic bit-width selection
# Precision is learned during training

Debugging Precision Issues

Overflow Detection

Check for overflows in simulation:
import numpy as np

# Test with wide range of inputs
X_test = np.linspace(-10, 10, 1000).reshape(-1, 1)

y_keras = model.predict(X_test)
y_hls = hls_model.predict(X_test)

# Large errors may indicate overflow
errors = np.abs(y_keras - y_hls)
if np.max(errors) > 1.0:
    print("Possible overflow detected!")
    print(f"Max error: {np.max(errors)}")

Saturation Analysis

Enable saturation to detect overflow:
# Enable saturation to prevent wrapping
config['LayerName']['dense_1'] = {
    'Precision': 'fixed<16,6,RND,SAT>'
}

# If saturation improves results, you need more integer bits

Layer-by-Layer Comparison

Identify which layer has precision issues:
import hls4ml
from hls4ml.model import profiling

# Profile provides layer-by-layer comparison
hls_model.compile()
profiling.numerical(keras_model, hls_model, X_test)

# Check output - large differences indicate precision problems

Best Practices

Begin with fixed<16,6> and adjust based on results. This provides a good balance for most models.
Accumulators should have more bits than inputs to prevent overflow:
config['LayerName']['dense_1'] = {
    'Precision': {
        'weight': 'fixed<8,4>',
        'result': 'fixed<16,6>',
        'accum': 'fixed<24,12>'  # Extra bits for accumulation
    }
}
Use representative test data that covers the full input range to detect overflow and underflow.
Some activations need specific precision:
  • Softmax: Needs higher precision for exp() function
  • TanH: Needs more integer bits for range [-1, 1]
  • Sigmoid: Similar to TanH
Always profile in software simulation before running synthesis to catch precision issues early.

Precision Optimization Example

Complete example of precision optimization workflow:
import hls4ml
import numpy as np
import tensorflow as tf

# 1. Create baseline model
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(16,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# 2. Start with conservative precision
config = hls4ml.utils.config_from_keras_model(
    model,
    default_precision='fixed<16,6>'
)

# 3. Convert and test
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config
)
hls_model.compile()

X_test = np.random.rand(1000, 16)
y_keras = model.predict(X_test)
y_hls = hls_model.predict(X_test)

baseline_mse = np.mean((y_keras - y_hls) ** 2)
print(f"Baseline MSE: {baseline_mse}")

# 4. Try reduced precision
config_opt = hls4ml.utils.config_from_keras_model(
    model,
    default_precision='fixed<12,4>'
)
config_opt['LayerName']['dense_2'] = {'Precision': 'fixed<16,6>'}  # Keep one layer high

hls_model_opt = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config_opt
)
hls_model_opt.compile()

y_hls_opt = hls_model_opt.predict(X_test)
optimized_mse = np.mean((y_keras - y_hls_opt) ** 2)
print(f"Optimized MSE: {optimized_mse}")

# 5. If acceptable, build and check resources
if optimized_mse < 0.01:
    hls_model_opt.build(csim=True, synth=True)
    # Check report for resource usage

Next Steps

Model Conversion

Learn more about model conversion options

HLS Backends

Understand backend-specific features

Build docs developers (and LLMs) love