Skip to main content

General Questions

hls4ml is a package for machine learning inference in FPGAs using High-Level Synthesis (HLS). It converts trained neural network models from Keras, PyTorch, and ONNX into optimized FPGA firmware.hls4ml is designed for ultra-low-latency applications requiring microsecond-level inference, such as:
  • High-energy physics triggers at CERN’s Large Hadron Collider
  • Real-time control systems for quantum computing
  • Feedback loops in nuclear fusion reactors
  • Low-power environmental monitoring on satellites
  • Biomedical signal processing (e.g., arrhythmia classification)
See the official documentation for full details.
hls4ml takes models from Keras, PyTorch, or ONNX (optionally quantized) and produces high-level synthesis code in C++ that can be converted to FPGA firmware using vendor HLS compilers:
  1. Parse the trained model to extract architecture and weights
  2. Optimize by fusing layers (e.g., BatchNorm into Conv), applying precision inference
  3. Generate C++ HLS code implementing each layer
  4. Synthesize with vendor tools (Vivado HLS, Vitis, Quartus, etc.) to create RTL
  5. Integrate into FPGA designs or run standalone
The generated firmware stores all weights on-chip for fast access with configurable parallelism.
hls4ml achieves ultra-low latency through several techniques:
  • On-chip weight storage: All model parameters stored in FPGA block RAM for immediate access
  • Spatial dataflow architecture: Exploits FPGA parallelism by implementing layers as hardware pipelines
  • Configurable parallelism: Fully parallel (ReuseFactor=1) for minimum latency or resource-sharing for efficiency
  • Fixed-point arithmetic: Uses optimized fixed-point types instead of floating-point
  • Layer fusion: Merges operations (e.g., Conv+BatchNorm+ReLU) into single hardware blocks
Typical latencies:
  • io_parallel: 50-200 nanoseconds for small models
  • io_stream: 1-10 microseconds for larger models
Compare this to GPU inference (milliseconds) or CPU (tens of milliseconds).
hls4ml supports many common layers in MLP, CNN, and RNN architectures:Supported:
  • Dense (fully connected) layers
  • Convolutional layers (Conv1D, Conv2D, DepthwiseConv)
  • Pooling layers (MaxPooling, AveragePooling, GlobalPooling)
  • Activations (ReLU, sigmoid, tanh, softmax, ELU, LeakyReLU, etc.)
  • Batch normalization
  • Recurrent layers (LSTM, GRU)
  • Residual/skip connections
  • Quantized layers (QKeras, HGQ, Brevitas)
Limited support:
  • Graph Neural Networks (experimental)
  • Transformers (early development, not stable)
  • Custom layers (requires extension API)
Not supported:
  • Large Language Models (LLMs)
  • Large vision transformers
  • Extremely novel architectures
If you encounter unsupported features, open an issue on GitHub.
It depends - parameter count alone is a poor predictor of FPGA resource usage.hls4ml has successfully deployed:
  • Small models: O(1,000) parameters on modest FPGAs
  • Medium models: O(10,000) parameters with quantization
  • Larger models: O(100,000) parameters on large FPGAs with aggressive optimization
Factors affecting resource usage:

Model Architecture

  • CNNs reuse parameters → lower resource usage
  • Dense layers → higher resource usage
  • Activations with LUTs → additional resources

Configuration

  • Precision: Lower bit widths → fewer resources
  • ReuseFactor: Higher values → more resource sharing
  • IOType: io_stream → lower resource usage than io_parallel
  • Strategy: Resource → more sharing than Latency
Use rule4ml or wa-hls4ml for quick estimates without synthesis.
Model compression techniques:
# Quantization - reduce bit widths
config['Model']['Precision'] = 'ap_fixed<8,3>'

# Increase reuse factor
config['Model']['ReuseFactor'] = 16

# Use streaming I/O
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config, io_type='io_stream'
)
LLMs and large vision transformers are not supported.

Getting Started

We strongly recommend the hls4ml tutorials for hands-on learning:
  1. Part 1: Introduction to FPGAs and HLS
  2. Part 2: Your first hls4ml model conversion
  3. Part 3: Model optimization and compression
  4. Part 4: Advanced features (profiling, custom layers)
Also see the Quickstart guide for a rapid introduction.Quick example:
import hls4ml
from keras.models import Sequential
from keras.layers import Dense

# Create model
model = Sequential([Dense(64, input_shape=(16,), activation='relu')])

# Convert to HLS
config = hls4ml.utils.config_from_keras_model(model)
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config, backend='Vitis'
)

# Test
hls_model.compile()
For development and testing:
  • Linux or macOS machine with Python 3.10+
  • C++ compiler (g++ on Linux, Xcode on macOS)
  • 8GB+ RAM recommended
For synthesis:
  • Vendor HLS tools (see Installation):
    • Vivado HLS 2020.1+ or Vitis HLS 2022.2+ (Xilinx/AMD)
    • Quartus 20.1-21.4 or oneAPI 2024.1-2025.0 (Intel/Altera)
    • Catapult HLS 2024.1+ (Siemens/Mentor)
  • 16GB+ RAM recommended for synthesis
  • Multi-core CPU helpful (synthesis is parallelizable)
For deployment:
  • Target FPGA board matching your chosen part
  • JTAG programmer/debugger
  • Host interface (PCIe, Ethernet, etc.) if needed
You can do everything except final synthesis without FPGA hardware using C simulation.
Choose based on your target FPGA vendor and design requirements:
BackendVendorBest ForNotes
VitisXilinx/AMDNew designs, UltraScale+Recommended for new projects
VivadoXilinx/AMDLegacy 7-series, UltraScaleMature, well-tested
QuartusIntel/AlteraStratix, Arria, CycloneStratix 10, Arria 10
oneAPIIntel/AlteraModern Intel FPGAsExperimental, uses SYCL
CatapultAnyASICs and FPGAsBest for ASIC flows
Example:
# Xilinx UltraScale+
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config, backend='Vitis'
)

# Intel Stratix 10
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config, backend='Quartus',
    part='1SG280LU2F50E2VG'
)
No! hls4ml abstracts away most FPGA details:
  • No HDL (Verilog/VHDL) knowledge required
  • No understanding of FPGA primitives needed
  • Basic Python and ML framework knowledge sufficient
However, some FPGA concepts help:
  • Fixed-point arithmetic and precision
  • Parallelism vs resource usage tradeoffs
  • Latency vs throughput
  • FPGA resources (LUTs, FFs, DSPs, BRAMs)
The tutorials provide this background.

Common Issues

This is usually a precision issue. hls4ml uses fixed-point arithmetic while ML frameworks use floating-point.Solutions:
1

Increase bit widths

# Default: ap_fixed<16,6> (16 total bits, 6 integer bits)
config['Model']['Precision'] = 'ap_fixed<32,16>'

# Or per-layer
config['LayerName']['my_layer']['Precision'] = 'ap_fixed<24,12>'
2

Use profiling to find optimal precision

from hls4ml.model.profiling import numerical
import matplotlib.pyplot as plt

# Profile weights and activations
plots = numerical(
    model=keras_model,
    hls_model=hls_model,
    X=test_data
)
plt.show()

# Grey boxes show current precision coverage
# Adjust until boxes contain the distributions
3

Check accumulator precision

# Accumulator needs more bits for sums
config['LayerName']['my_dense']['accum_t'] = 'ap_fixed<32,16>'
Some mismatch is expected for activation functions implemented as lookup tables (LUTs). Bit-exact behavior is not always possible.
This error means your model is too large for the chosen configuration.Solutions:
# 1. Switch to streaming I/O
hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    io_type='io_stream',  # instead of io_parallel
    backend='Vitis'
)

# 2. Use Resource strategy
config['Model']['Strategy'] = 'Resource'  # instead of 'Latency'

# 3. Increase reuse factor
config['Model']['ReuseFactor'] = 8  # or 16, 32, etc.
This indicates HLS compilation failed. It’s likely a bug in hls4ml’s code generation.Debugging steps:
  1. Check the log file:
    cat my-hls-test/vivado_hls.log
    # or
    cat my-hls-test/vitis_hls.log
    
  2. Look for C++ errors in the generated code:
    # Check for syntax errors
    g++ -c my-hls-test/firmware/myproject.cpp -Imy-hls-test/firmware
    
  3. Common causes:
    • Unsupported layer configuration
    • Edge case in layer shape handling
    • Missing layer handler for custom operations
  4. Report the bug: Open an issue at github.com/fastmachinelearning/hls4ml with:
    • Model architecture (code or summary)
    • Configuration used
    • Error messages from log
    • hls4ml version
Even if the error message looks similar to existing bugs, the cause may differ. Unless you’re certain it’s the same issue, open a new bug report.
Large models can take hours to synthesize and use >32GB RAM.Solutions:
# 1. Skip synthesis, just compile for simulation
hls_model.compile()  # Fast C++ compilation
hls_model.predict(X)  # Test without synthesis

# 2. Reduce parallelism
config['Model']['ReuseFactor'] = 16
config['Model']['Strategy'] = 'Resource'

# 3. Use streaming I/O
io_type='io_stream'

# 4. Run synthesis on a more powerful machine
# Or use cloud instances (AWS, GCP, Azure)
First few layers often dominate synthesis time. Optimize those first.
This suggests broken installation or missing dependencies.
# Reinstall hls4ml
pip uninstall hls4ml
pip install hls4ml

# Verify installation
import hls4ml
print(hls4ml.__version__)

# Check specific imports
from hls4ml import converters, utils, report

# If still failing, check for dependency issues
pip install --upgrade numpy pyyaml h5py
hls4ml doesn’t automatically support custom layers - you need to register a handler.Options:
  1. Replace with equivalent supported layers (easiest)
  2. Use the extension API to add support:
    from hls4ml.converters import register_keras_v2_layer_handler
    from hls4ml.model.layers import Layer
    
    # Define your custom layer
    class MyCustomLayer(Layer):
        def initialize(self):
            # Setup layer attributes
            pass
    
    # Register handler
    @register_keras_v2_layer_handler('MyCustomLayer')
    def parse_my_layer(keras_layer, input_names, input_shapes, data_reader, config):
        # Parse layer and return hls4ml layer
        pass
    
  3. Request support via GitHub Discussions
See the extension documentation for details.

Optimization & Performance

Priority order (most impact first):
1

Reduce precision

# Start with 16 bits, reduce gradually
config['Model']['Precision'] = 'ap_fixed<8,3>'

# Use profiling to find minimum viable precision
2

Increase ReuseFactor

# 1 = fully parallel (max resources, min latency)
# Higher values = more resource sharing
config['Model']['ReuseFactor'] = 16
3

Use Resource strategy

config['Model']['Strategy'] = 'Resource'
4

Switch to streaming I/O

hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config, io_type='io_stream'
)
5

Model compression

  • Quantization-aware training (QKeras, HGQ, Brevitas)
  • Pruning
  • Knowledge distillation to smaller model
Priority order:
# 1. Use parallel I/O
io_type='io_parallel'

# 2. Use Latency strategy
config['Model']['Strategy'] = 'Latency'

# 3. Set ReuseFactor = 1 (fully parallel)
config['Model']['ReuseFactor'] = 1

# 4. Use wider pipelines
config['Model']['Precision'] = 'ap_fixed<8,3>'  # fewer bits = faster

# 5. Optimize model architecture
# - Fewer layers
# - Smaller kernels
# - Depthwise convolutions
Lower latency = higher resource usage. There’s always a tradeoff.
Featureio_parallelio_stream
LatencyLowest (50-200ns)Higher (1-10μs)
ResourcesHighestLower
ThroughputOne inference at a timePipelined
Best forSingle inputs, min latencyStreaming data, large models
I/O interfaceAll inputs/outputs at onceSequential stream
# Parallel I/O - all inputs available simultaneously
y = hls_model.predict(x)  # Returns in ~100ns

# Streaming I/O - inputs arrive sequentially  
# Lower resource usage, higher throughput for continuous streams
from hls4ml.model.profiling import numerical
import matplotlib.pyplot as plt

# Load your model and test data
model = load_keras_model('model.h5')
X_test = np.load('test_data.npy')

# Convert to HLS
config = hls4ml.utils.config_from_keras_model(model, granularity='name')
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config
)

# Profile (requires: pip install hls4ml[profiling])
plots = numerical(
    model=model,
    hls_model=hls_model,
    X=X_test
)

plt.show()
Interpreting plots:
  • Box plots show distribution of weights/activations
  • Grey boxes show range representable with current precision
  • If box-and-whisker extends outside grey box → increase precision
  • If grey box is much larger → can reduce precision

Contributing

We welcome contributions! Here’s how to get involved:
1

Start a discussion

If you have a feature idea, start a GitHub Discussion first.
2

Review contribution guidelines

Read CONTRIBUTING.md for technical requirements:
  • Code style (using ruff formatter)
  • Testing requirements
  • Documentation standards
3

Fork and develop

git clone https://github.com/YOUR_USERNAME/hls4ml.git
cd hls4ml
pip install -e .[testing]

# Make changes
# Run tests
pytest test/
4

Submit pull request

  • Describe changes clearly
  • Include tests for new features
  • Update documentation if needed
Join the community:
  • Attend online meetings (request invite via CERN e-group)
  • Chat on GitHub Discussions
  • Present your use cases
Acknowledgment:
We acknowledge the Fast Machine Learning collective as an open community of multi-domain experts and collaborators.
Open an issue with:
  1. Environment info:
    import hls4ml
    print(hls4ml.__version__)
    import sys
    print(sys.version)
    
  2. Minimal reproducible example:
    # Code that demonstrates the bug
    
  3. Error messages:
    • Full stack trace
    • HLS log files if synthesis failed
  4. Expected vs actual behavior
Even if the error looks similar to existing issues, open a new one if you’re not certain - the root cause may differ.

Additional Resources

Documentation

Official hls4ml documentation

Tutorials

Hands-on Jupyter notebooks

GitHub Discussions

Ask questions and share projects

Example Models

Pre-trained models to experiment with

Video Tutorial

Intro to FPGAs, HLS, and ML inference

Research Papers

Citations and publications

Citation

If you use hls4ml in research, please cite:
@software{fastml_hls4ml,
  author       = {{FastML Team}},
  title        = {fastmachinelearning/hls4ml},
  year         = 2025,
  publisher    = {Zenodo},
  version      = {v1.2.0},
  doi          = {10.5281/zenodo.1201549},
  url          = {https://github.com/fastmachinelearning/hls4ml}
}

@article{Duarte:2018ite,
    author = "Duarte, Javier and others",
    title = "{Fast inference of deep neural networks in FPGAs for particle physics}",
    journal = "JINST",
    volume = "13",
    number = "07",
    pages = "P07027",
    year = "2018",
    doi = "10.1088/1748-0221/13/07/P07027"
}

Build docs developers (and LLMs) love