Catapult Backend

Overview

The Catapult backend enables deployment of neural networks on both FPGAs and ASICs using Siemens Catapult HLS compiler. It supports flexible targeting of FPGA devices or ASIC technology libraries, making it suitable for both prototyping and production designs.

When to Use Catapult Backend

ASIC design flows: Target standard cell libraries for ASIC implementation
FPGA prototyping: Use Xilinx or other FPGA devices
Advanced HLS features: Leverage Catapult’s optimization capabilities
Multi-target projects: Design once, deploy to FPGA or ASIC

Catapult HLS support was added in hls4ml version 1.0.0 and continues to receive active development.

Installation and Setup

Prerequisites

Siemens Catapult HLS (ensure catapult is on PATH or set MGC_HOME or CATAPULT_HOME)
Python 3.8 or higher
hls4ml library installed
FPGA or ASIC technology libraries

Environment Setup

# Option 1: catapult on PATH
export PATH=/path/to/catapult/bin:$PATH
command -v catapult

# Option 2: Set MGC_HOME
export MGC_HOME=/path/to/mentor/catapult

# Option 3: Set CATAPULT_HOME
export CATAPULT_HOME=/path/to/catapult

Configuration

Basic Configuration

Create a model configuration for the Catapult backend:

import hls4ml

config = hls4ml.utils.config_from_keras_model(
    model,
    granularity='name',
    backend='Catapult'
)

# Convert model for FPGA target
hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    output_dir='my_catapult_project',
    backend='Catapult',
    tech='fpga',
    part='xcku115-flvb2104-2-i',
    clock_period=5,
    io_type='io_parallel'
)

# Or convert for ASIC target
hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    output_dir='my_asic_project',
    backend='Catapult',
    tech='asic',
    asiclibs='nangate-45nm',
    clock_period=5,
    io_type='io_parallel'
)

Configuration Options

tech

string

default:"fpga"

Target technology:

fpga: FPGA implementation
asic: ASIC implementation

part

string

default:"xcvu13p-flga2577-2-e"

FPGA part number (when tech=‘fpga’)

asiclibs

string

default:"nangate-45nm"

ASIC technology library (when tech=‘asic’):

nangate-45nm
nangate-15nm
Custom library name

clock_period

int

default:"5"

Clock period in nanoseconds

fifo

int

default:"None"

FIFO depth for streaming designs

io_type

string

default:"io_parallel"

I/O implementation type:

io_parallel: Parallel data processing
io_stream: Streaming dataflow architecture

Layer Configuration

Strategy Options

config['Model']['Strategy'] = 'Resource'  # or 'Latency'

# Per-layer configuration
config['dense_layer'] = {
    'ReuseFactor': 16,
    'Strategy': 'Resource',  # 'Latency' or 'Resource'
    'Precision': 'ac_fixed<16,6>',
    'accum_t': 'ac_fixed<24,12>'
}

Dense Layers

config['dense_layer'] = {
    'ReuseFactor': 8,
    'Strategy': 'Resource',
    'Precision': 'ac_fixed<16,6>'
}

Convolutional Layers

config['conv2d_layer'] = {
    'ReuseFactor': 8,
    'Strategy': 'Resource',
    'ParallelizationFactor': 4,
    'ConvImplementation': 'LineBuffer',  # or 'Encoded'
    'Precision': 'ac_fixed<16,6>'
}

Convolution Implementations:

LineBuffer: Streaming line buffer (efficient for io_stream)
Encoded: Encoded implementation for io_parallel

Recurrent Layers

config['lstm_layer'] = {
    'ReuseFactor': 1,
    'RecurrentReuseFactor': 1,
    'Strategy': 'Resource',
    'static': True,  # Static vs dynamic unrolling
    'table_size': 1024,
    'table_t': 'ac_fixed<18,8>'
}

Separable Convolution

config['sepconv2d_layer'] = {
    'ReuseFactor': 8,
    'Strategy': 'Resource',
    'dw_output': 'ac_fixed<16,8>',  # Depthwise output precision
    'ConvImplementation': 'LineBuffer'
}

Build Process

Synthesis Commands

# Compile the model
hls_model.compile()

# Build with Catapult HLS
report = hls_model.build(
    reset=False,      # Reset project
    csim=True,        # C simulation
    synth=True,       # HLS synthesis
    cosim=False,      # RTL co-simulation
    validation=False, # Validation
    export=False,     # Export RTL
    vsynth=False,     # FPGA/ASIC synthesis
    fifo_opt=False,   # FIFO optimization
    bitfile=False,    # Generate bitfile
    vhdl=False,       # Generate VHDL
    verilog=True,     # Generate Verilog
    ran_frame=5,      # Random test frames
    sw_opt=False,     # Software optimization
    power=False,      # Power analysis
    da=False,         # Design Analyzer
    bup=False         # Backup project
)

Build Options

Catapult Build Parameters

Option	Description	Default
`reset`	Reset project before building	`False`
`csim`	Run C simulation	`True`
`synth`	Run HLS synthesis	`True`
`cosim`	Run RTL co-simulation	`False`
`validation`	Run validation tests	`False`
`export`	Export RTL	`False`
`vsynth`	Run downstream synthesis	`False`
`fifo_opt`	Optimize FIFO depths	`False`
`bitfile`	Generate FPGA bitfile	`False`
`vhdl`	Generate VHDL output	`False`
`verilog`	Generate Verilog output	`True`
`ran_frame`	Number of random test frames	`5`
`sw_opt`	Software optimization	`False`
`power`	Power analysis	`False`
`da`	Design Analyzer	`False`
`bup`	Backup project	`False`

Build Script

Catapult uses a TCL script for building:

cd my_catapult_project
catapult -product ultra -shell -f build_prj.tcl -eval 'set ::argv "synth=1 csim=1"'

Example Project Structure

my_catapult_project/
├── firmware/
│   ├── myproject.cpp          # Top-level implementation
│   ├── myproject.h            # Header declarations
│   ├── parameters.h           # Network parameters
│   ├── defines.h              # Macro definitions
│   ├── weights/               # Weight data
│   └── nnet_utils/            # Utility functions
├── tb_data/
│   ├── tb_input_features.dat
│   └── tb_output_predictions.dat
├── myproject_test.cpp         # Testbench
├── build_prj.tcl              # Catapult HLS script
└── Catapult/                  # Catapult project (after build)
    ├── myproject.v1/
    │   ├── concat_rtl.v       # Generated RTL
    │   ├── scverify/          # Verification files
    │   └── cycle_reports/     # Timing reports
    └── catapult.log

Precision Types

Catapult backend uses Algorithmic C (AC) datatypes:

# Fixed-point: ac_fixed<width, int_width, signed, quantization, overflow>
config['layer']['Precision'] = 'ac_fixed<16,6,true>'
config['layer']['accum_t'] = 'ac_fixed<24,12,true>'

# Integer: ac_int<width, signed>
config['layer']['index_t'] = 'ac_int<8,false>'

Common Precision Configurations

# 16-bit fixed-point
config['layer']['Precision'] = 'ac_fixed<16,6,true>'  # 6 integer bits

# 8-bit quantized
config['layer']['Precision'] = 'ac_fixed<8,3,true>'   # 3 integer bits

# Wide accumulator
config['layer']['accum_t'] = 'ac_fixed<32,16,true>'   # 16 integer bits

Performance Optimization

Dataflow Architecture

Convolution layers in Catapult require dataflow pipeline style for proper operation.

# Automatically set for models with convolutions
config['Model']['PipelineStyle'] = 'dataflow'

# This enables:
# - Parallel execution of layers
# - Streaming between layers
# - Optimal throughput

FIFO Optimization

For streaming designs:

# Build with FIFO optimization
report = hls_model.build(
    synth=True,
    fifo_opt=True  # Optimize FIFO depths
)

# Or specify FIFO depth in config
hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    backend='Catapult',
    fifo=32  # Set FIFO depth
)

Reuse Factor Tuning

# Aggressive parallelization
config['conv2d']['ReuseFactor'] = 1
config['conv2d']['ParallelizationFactor'] = 8

# Balanced approach
config['conv2d']['ReuseFactor'] = 8
config['conv2d']['ParallelizationFactor'] = 4

# Resource-constrained
config['conv2d']['ReuseFactor'] = 64
config['conv2d']['ParallelizationFactor'] = 1

ASIC Design Flow

Technology Library Setup

# Configure for ASIC
hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    backend='Catapult',
    tech='asic',
    asiclibs='nangate-45nm',  # or your technology library
    clock_period=2.0  # Faster clock for ASIC (2ns = 500MHz)
)

ASIC-Specific Optimizations

# Lower reuse factors for ASIC (more area available)
config['Model']['ReuseFactor'] = 4

# Tighter precision for area optimization
config['Model']['Precision'] = 'ac_fixed<12,4>'

# Enable power analysis
report = hls_model.build(
    synth=True,
    power=True  # Analyze power consumption
)

Performance Characteristics

Resource Usage Estimates

FPGA (Small MLP):

LUTs: 8K-20K
FFs: 5K-15K
DSPs: 15-40
BRAM: 10-30

ASIC (Small MLP on 45nm):

Area: 0.2-0.5 mm²
Gates: 50K-150K
Memory: 50-200 KB

Latency Patterns

io_parallel:

Latency = Σ(layer_latency)
II = 1 (fully pipelined)

io_stream with dataflow:

Throughput = 1 / max(layer_II)
Pipeline stages = number of layers

Clock Frequencies

FPGA:

Xilinx UltraScale+: 200-350 MHz
Intel Stratix 10: 250-400 MHz

ASIC:

45nm: 300-600 MHz
28nm: 500-1000 MHz
7nm: 1-2 GHz

Advanced Features

Winograd Kernel Transformation

Automatic optimization for 3x3 convolutions:

# Enabled automatically during optimization passes
# Reduces multiplications for 3x3 convolutions
# Particularly beneficial for ASIC implementations

im2col Code Generation

For efficient convolution implementation:

config['conv2d']['ConvImplementation'] = 'LineBuffer'
# Generates im2col transformation for matrix multiplication

Custom Resource Strategies

# Mixed strategy design
config['conv2d_1']['Strategy'] = 'Latency'  # Unrolled
config['conv2d_2']['Strategy'] = 'Resource'  # Serialized
config['dense_1']['Strategy'] = 'Resource'   # Serialized

Troubleshooting

Catapult HLS not found

# Check installation paths
echo $MGC_HOME
echo $CATAPULT_HOME
which catapult

# Set environment variable
export MGC_HOME=/path/to/mentor/catapult
# or
export CATAPULT_HOME=/path/to/catapult

# Verify
$MGC_HOME/bin/catapult -version

Dataflow pipeline error with convolutions

# Catapult requires dataflow for convolutions
# This is set automatically, but if you see errors:

config['Model']['PipelineStyle'] = 'dataflow'

# Rebuild the model
hls_model = hls4ml.converters.convert_from_keras_model(
    model, hls_config=config, backend='Catapult'
)

FIFO depth issues

# If you see FIFO overflow/underflow warnings:

# Option 1: Set explicit FIFO depth
hls_model = hls4ml.converters.convert_from_keras_model(
    model, backend='Catapult', fifo=64
)

# Option 2: Enable FIFO optimization
report = hls_model.build(fifo_opt=True)

Timing violations in ASIC flow

Increase clock period
Reduce precision to simplify logic
Increase reuse factors
Enable additional pipelining
Check critical paths in reports

Example: Complete Workflow

FPGA Target

import hls4ml
from tensorflow import keras
import numpy as np

# Load model
model = keras.models.load_model('my_cnn.h5')

# Create configuration
config = hls4ml.utils.config_from_keras_model(model, granularity='name')
config['Model']['Strategy'] = 'Resource'
config['Model']['ReuseFactor'] = 16

# Convert to Catapult HLS
hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    output_dir='catapult_fpga',
    backend='Catapult',
    tech='fpga',
    part='xcku115-flvb2104-2-i',
    clock_period=5,
    io_type='io_stream'
)

# Build and synthesize
hls_model.compile()
report = hls_model.build(
    csim=True,
    synth=True,
    cosim=False,
    export=True,
    verilog=True
)

print(f"Resources: LUT={report['LUT']}, FF={report['FF']}, DSP={report['DSP']}")

ASIC Target

# Configure for ASIC
hls_model_asic = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    output_dir='catapult_asic',
    backend='Catapult',
    tech='asic',
    asiclibs='nangate-45nm',
    clock_period=2.0,  # 500 MHz
    io_type='io_stream'
)

# Build with power analysis
hls_model_asic.compile()
report_asic = hls_model_asic.build(
    csim=True,
    synth=True,
    export=True,
    verilog=True,
    power=True  # Power analysis for ASIC
)

print(f"Area: {report_asic['Area']} um^2")
print(f"Power: {report_asic['Power']} mW")

Vivado Backend

Alternative Xilinx FPGA backend

Advanced Optimization

Optimize model performance

HLS Backends

Compare different backends

Precision Guide

Configure numeric precision

Getting Started

Core Concepts

Frontends

Backends

Advanced Features

Internals

​Overview

​When to Use Catapult Backend

​Installation and Setup

​Prerequisites

​Environment Setup

​Configuration

​Basic Configuration

​Configuration Options

​Layer Configuration

​Strategy Options

​Dense Layers

​Convolutional Layers

​Recurrent Layers

​Separable Convolution

​Build Process

​Synthesis Commands

​Build Options

​Build Script

​Example Project Structure

​Precision Types

​Common Precision Configurations

​Performance Optimization

​Dataflow Architecture

​FIFO Optimization

​Reuse Factor Tuning

​ASIC Design Flow

​Technology Library Setup

​ASIC-Specific Optimizations

​Performance Characteristics

​Resource Usage Estimates

​Latency Patterns

​Clock Frequencies

​Advanced Features

​Winograd Kernel Transformation

​im2col Code Generation

​Custom Resource Strategies

​Troubleshooting

​Example: Complete Workflow

​FPGA Target

​ASIC Target

​Related Resources

Vivado Backend

Advanced Optimization

HLS Backends

Precision Guide

Build docs developers (and LLMs) love

Overview

When to Use Catapult Backend

Installation and Setup

Prerequisites

Environment Setup

Configuration

Basic Configuration

Configuration Options

Layer Configuration

Strategy Options

Dense Layers

Convolutional Layers

Recurrent Layers

Separable Convolution

Build Process

Synthesis Commands

Build Options

Build Script

Example Project Structure

Precision Types

Common Precision Configurations

Performance Optimization

Dataflow Architecture

FIFO Optimization

Reuse Factor Tuning

ASIC Design Flow

Technology Library Setup

ASIC-Specific Optimizations

Performance Characteristics

Resource Usage Estimates

Latency Patterns

Clock Frequencies

Advanced Features

Winograd Kernel Transformation

im2col Code Generation

Custom Resource Strategies

Troubleshooting

Example: Complete Workflow

FPGA Target

ASIC Target

Related Resources