Skip to main content
QKeras provides quantization-aware training (QAT) layers that seamlessly integrate with hls4ml. Quantizers define the bit-widths and numerical representations used in your model.

Overview

QKeras extends Keras with quantized layers and quantizers that:
  • Simulate fixed-point arithmetic during training
  • Support binary and ternary quantization
  • Enable power-of-2 (po2) quantization
  • Provide fine-grained control over precision
Install QKeras: pip install qkeras

QKeras Quantizers

quantized_bits

The most common quantizer for fixed-point values:
from qkeras import QDense, quantized_bits

# 8-bit fixed-point with 4 integer bits (including sign)
layer = QDense(
    units=64,
    kernel_quantizer=quantized_bits(bits=8, integer=3),
    bias_quantizer=quantized_bits(bits=8, integer=3)
)
Parameters:
  • bits: Total bit-width
  • integer: Number of integer bits (not including sign)
  • symmetric: Use symmetric quantization (default: False)
  • alpha: Scaling factor (default: None)
  • keep_negative: Maintain negative values (default: True)

Binary Quantization

For extreme compression with binary weights:
from qkeras import binary

# Binary quantization: {-1, +1}
layer = QDense(
    units=32,
    kernel_quantizer=binary(alpha=1.0)
)
Binary quantizers in QKeras produce values in . In hls4ml, this maps to:
  • 1-bit XnorPrecisionType for kernel weights
  • 2-bit IntegerPrecisionType for other uses

Ternary Quantization

Three-level quantization for better accuracy than binary:
from qkeras import ternary

# Ternary quantization: {-1, 0, +1}
layer = QDense(
    units=32,
    kernel_quantizer=ternary(alpha=1.0)
)

Power-of-2 Quantization

Constrains weights to powers of two (no multipliers needed):
from qkeras import quantized_po2

# Weights are restricted to powers of 2
layer = QDense(
    units=64,
    kernel_quantizer=quantized_po2(bits=8, max_value=8)
)
Po2 quantization uses only bit shifts, eliminating multipliers entirely.

QKeras Layers

QDense

Quantized fully-connected layer:
from qkeras import QDense, quantized_bits

model = Sequential([
    QDense(
        units=128,
        kernel_quantizer=quantized_bits(8, 3),
        bias_quantizer=quantized_bits(8, 3),
        activation='relu'
    )
])

QConv2D

Quantized 2D convolution:
from qkeras import QConv2D

model = Sequential([
    QConv2D(
        filters=32,
        kernel_size=(3, 3),
        kernel_quantizer=quantized_bits(8, 3),
        bias_quantizer=quantized_bits(8, 3),
        activation='relu'
    )
])

QActivation

Quantized activation function:
from qkeras import QActivation, quantized_relu

model = Sequential([
    Dense(64),
    QActivation(quantized_relu(bits=8, integer=3))
])
Always place a QActivation immediately after the input layer to ensure input precision is properly inferred by hls4ml.

QBatchNormalization

Quantized batch normalization:
from qkeras import QBatchNormalization

model = Sequential([
    QConv2D(32, (3, 3), kernel_quantizer=quantized_bits(8, 3)),
    QBatchNormalization(
        gamma_quantizer=quantized_bits(8, 3),
        beta_quantizer=quantized_bits(8, 3)
    ),
    QActivation('quantized_relu(8, 3)')
])

Quantized Activations

quantized_relu

from qkeras import quantized_relu

QActivation(quantized_relu(bits=6, integer=2))

quantized_tanh

from qkeras import quantized_tanh

QActivation(quantized_tanh(bits=8, integer=3))

quantized_sigmoid

from qkeras import quantized_sigmoid

QActivation(quantized_sigmoid(bits=8, integer=3))

Complete QKeras Example

Building a quantized model from scratch:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from qkeras import QDense, QConv2D, QActivation, quantized_bits, quantized_relu

# Build quantized model
model = Sequential([
    # Input quantization
    QActivation(quantized_bits(8, 3), input_shape=(32, 32, 3)),
    
    # First conv block
    QConv2D(
        filters=32,
        kernel_size=(3, 3),
        kernel_quantizer=quantized_bits(8, 3),
        bias_quantizer=quantized_bits(8, 3),
        padding='same'
    ),
    QActivation(quantized_relu(8, 3)),
    
    # Second conv block
    QConv2D(
        filters=64,
        kernel_size=(3, 3),
        kernel_quantizer=quantized_bits(8, 3),
        bias_quantizer=quantized_bits(8, 3),
        padding='same',
        strides=(2, 2)
    ),
    QActivation(quantized_relu(8, 3)),
    
    # Dense layers
    tf.keras.layers.Flatten(),
    QDense(
        units=128,
        kernel_quantizer=quantized_bits(8, 3),
        bias_quantizer=quantized_bits(8, 3)
    ),
    QActivation(quantized_relu(8, 3)),
    
    QDense(
        units=10,
        kernel_quantizer=quantized_bits(8, 3),
        bias_quantizer=quantized_bits(8, 3),
        activation='softmax'
    )
])

# Compile and train as usual
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.fit(X_train, y_train, epochs=10, validation_split=0.1)

Converting QKeras Models to hls4ml

QKeras models convert seamlessly to hls4ml:
import hls4ml

# Create configuration
config = hls4ml.utils.config_from_keras_model(
    model,
    granularity='name'
)

# Convert to hls4ml
hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    output_dir='qkeras_hls4ml',
    backend='Vivado'
)

# Quantizers are automatically extracted
hls_model.compile()
hls4ml automatically detects QKeras quantizers and applies the correct precision types. No manual precision configuration needed!

Binary Neural Networks

Extreme quantization with binary weights and activations:
from qkeras import QDense, QActivation, binary, quantized_bits

# Binary neural network
model = Sequential([
    # Input still uses more bits for better accuracy
    QActivation(quantized_bits(8, 3), input_shape=(784,)),
    
    # Binary dense layer
    QDense(
        units=256,
        kernel_quantizer=binary(alpha=1.0),
        bias_quantizer=quantized_bits(8, 3)  # Bias usually not binary
    ),
    QActivation(binary(alpha=1.0)),
    
    QDense(
        units=256,
        kernel_quantizer=binary(alpha=1.0),
        bias_quantizer=quantized_bits(8, 3)
    ),
    QActivation(binary(alpha=1.0)),
    
    # Output layer with more precision
    QDense(
        units=10,
        kernel_quantizer=quantized_bits(8, 3),
        bias_quantizer=quantized_bits(8, 3),
        activation='softmax'
    )
])

Benefits of Binary Networks

Memory

32x reduction in model size compared to FP32

Speed

Replace multiplications with XNOR operations

Power

Dramatically lower power consumption

Ternary Neural Networks

Slightly more precision than binary with zero weights:
from qkeras import ternary

model = Sequential([
    QActivation(quantized_bits(8, 3), input_shape=(784,)),
    
    QDense(
        units=256,
        kernel_quantizer=ternary(alpha=1.0),
        bias_quantizer=quantized_bits(8, 3)
    ),
    QActivation(ternary(alpha=1.0)),
    
    QDense(
        units=10,
        kernel_quantizer=quantized_bits(8, 3),
        bias_quantizer=quantized_bits(8, 3),
        activation='softmax'
    )
])

Stochastic Quantization

Add noise during training for better convergence:
from qkeras import stochastic_binary, stochastic_ternary

# Stochastic binary
layer = QDense(
    units=128,
    kernel_quantizer=stochastic_binary(alpha=1.0)
)

# Stochastic ternary
layer = QDense(
    units=128,
    kernel_quantizer=stochastic_ternary(alpha=1.0, threshold=0.5)
)
Stochastic quantizers add randomness during training to escape local minima. At inference time, they behave like their deterministic counterparts.

Advanced Techniques

Heterogeneous Precision

Use different bit-widths for different layers:
model = Sequential([
    # Early layers: higher precision
    QActivation(quantized_bits(8, 3), input_shape=(784,)),
    QDense(128, kernel_quantizer=quantized_bits(8, 3)),
    QActivation(quantized_relu(8, 3)),
    
    # Middle layers: medium precision
    QDense(64, kernel_quantizer=quantized_bits(6, 2)),
    QActivation(quantized_relu(6, 2)),
    
    # Late layers: lower precision acceptable
    QDense(32, kernel_quantizer=quantized_bits(4, 1)),
    QActivation(quantized_relu(4, 1)),
    
    # Output: back to higher precision
    QDense(10, kernel_quantizer=quantized_bits(8, 3))
])

Mixed Binary-Ternary-Float

model = Sequential([
    QActivation(quantized_bits(8, 3), input_shape=(784,)),
    
    # First layer: ternary (more precision for input processing)
    QDense(256, kernel_quantizer=ternary(alpha=1.0)),
    QActivation(quantized_relu(4, 2)),
    
    # Middle layers: binary (aggressive compression)
    QDense(128, kernel_quantizer=binary(alpha=1.0)),
    QActivation(binary(alpha=1.0)),
    
    QDense(64, kernel_quantizer=binary(alpha=1.0)),
    QActivation(binary(alpha=1.0)),
    
    # Output: standard quantized (need precision for classification)
    QDense(10, kernel_quantizer=quantized_bits(8, 3))
])

Best Practices

Begin with standard quantized_bits before trying binary/ternary. This establishes a baseline and helps you understand precision requirements.
Always add QActivation as the first layer to quantize inputs. This ensures hls4ml correctly infers input precision.
Input and output layers often benefit from higher precision. Reserve aggressive quantization for middle layers.
Fine-tune a pretrained FP32 model with QKeras quantizers instead of training from scratch. This typically gives better accuracy.
Overflow in fixed-point arithmetic causes severe accuracy loss. Use profiling to ensure integer bits are sufficient.
Always validate quantized models with hls4ml C simulation before synthesis to catch precision issues early.

Troubleshooting

  • Ensure input layer uses higher precision (8-bit or more)
  • Try ternary instead of binary for more flexibility
  • Increase network width to compensate for reduced precision
  • Use batch normalization between layers
  • Reduce learning rate (quantized training is more sensitive)
  • Add gradient clipping
  • Increase integer bits in quantizers
  • Check for overflow in accumulator types
  • Ensure all layers use QKeras versions (QDense, not Dense)
  • Add explicit input quantization with QActivation
  • Check that quantizer configurations are supported
  • Verify QKeras is installed: pip install qkeras

API Reference

Quantizer Functions

qkeras.quantized_bits(bits, integer=0, symmetric=False, alpha=None, keep_negative=True)
qkeras.binary(alpha=1.0)
qkeras.ternary(alpha=1.0, threshold=0.5)
qkeras.quantized_po2(bits, max_value=None)
qkeras.quantized_relu(bits, integer=0, use_sigmoid=False)
qkeras.quantized_tanh(bits, integer=0, symmetric=False)

QKeras Layers

qkeras.QDense(units, kernel_quantizer=None, bias_quantizer=None, ...)
qkeras.QConv1D(filters, kernel_size, kernel_quantizer=None, bias_quantizer=None, ...)
qkeras.QConv2D(filters, kernel_size, kernel_quantizer=None, bias_quantizer=None, ...)
qkeras.QDepthwiseConv2D(kernel_size, depthwise_quantizer=None, bias_quantizer=None, ...)
qkeras.QActivation(activation, ...)
qkeras.QBatchNormalization(gamma_quantizer=None, beta_quantizer=None, ...)

Build docs developers (and LLMs) love