io_stream mode, layers are connected through FIFO buffers. The FIFO Depth Optimization feature automatically sizes these buffers based on runtime profiling, reducing BRAM and LUT usage.
Overview
In streaming architectures, each layer output is buffered in a FIFO before the next layer consumes it. By default, hls4ml uses conservative FIFO depths that can over-utilize resources. FIFO depth optimization profiles the design during RTL co-simulation to determine the actual maximum FIFO occupancy.FIFO depth optimization is available for the Vivado and Vitis backends.
How It Works
Set large profiling FIFOs
All FIFOs are initialized to a large depth (default: 100,000) and implemented in BRAM for profiling.
Run RTL co-simulation
The design is simulated with test data, and VCD (Value Change Dump) traces record FIFO occupancy.
Extract maximum depths
The optimization pass parses VCD files to determine the maximum depth reached by each FIFO.
Basic Usage
Vivado Backend
Vitis Backend
Configuration Options
Profiling FIFO Depth
The initial FIFO depth for profiling:- Large (100k+)
- Medium (10k-50k)
- Disabled (0)
Use when:
- Complex models with deep pipelines
- Uncertain about peak FIFO usage
- First-time optimization
Understanding Results
After optimization completes, amax_depth.json file is created:
- name: FIFO identifier
- max: Maximum occupancy observed during co-simulation
- depth: Assigned depth (max + 1)
The optimized FIFO depth is always max + 1 to ensure at least one empty slot.
Integration with Build Flow
FIFO optimization automatically integrates with the build process:Build Parameters
- csim=True: C simulation (required)
- synth=True: C synthesis (required to generate RTL)
- cosim=True: RTL co-simulation (required for profiling)
- fifo_opt=True: Enable FIFO optimization (auto-enabled by flow)
Skipping
cosim=True will cause FIFO optimization to fail because VCD traces are not generated.Resource Savings
Typical resource savings from FIFO depth optimization:BRAM
20-60% reduction in BRAM usage for FIFO implementation
LUT
10-30% reduction in LUT usage when FIFOs use distributed RAM
Timing
May improve timing by reducing routing congestion
Example Resource Comparison
| Resource | Default FIFOs | Optimized FIFOs | Savings |
|---|---|---|---|
| BRAM_18K | 48 | 18 | 62.5% |
| LUT | 12,453 | 9,821 | 21.1% |
| FF | 15,672 | 14,109 | 10.0% |
| DSP | 64 | 64 | 0% |
Advanced Usage
Custom Test Data
Provide specific test data for profiling:Multiple Optimization Iterations
Refine optimization with multiple passes:Selective FIFO Optimization
Optimize only specific FIFOs:Verification
After FIFO optimization, verify correctness:Check Resource Reports
Run Additional Co-simulation
Compare Accuracy
Troubleshooting
Optimization fails: no FIFOs found
Optimization fails: no FIFOs found
Cause: FIFOs were not implemented in BRAM during profiling.Solution:
- Increase
profiling_fifo_depth(e.g., to 100,000) - Check that
io_type='io_stream'is set - Verify model has multiple layers (single-layer models may not have FIFOs)
Co-simulation hangs or fails
Co-simulation hangs or fails
Cause: Profiling FIFOs are too small and overflow.Solution:
- Increase
profiling_fifo_depth - Check VCD file for overflow indicators
- Use more representative test data
Results incorrect after optimization
Results incorrect after optimization
Cause: FIFO depths were underestimated during profiling.Solution:
- Use more diverse test data for profiling
- Increase profiling FIFO depth
- Manually add safety margin to depths in max_depth.json
VCD file not found
VCD file not found
Cause: RTL co-simulation did not complete.Solution:
- Ensure
cosim=Truein build command - Check for errors in co-simulation logs
- Verify HLS tool installation and license
Best Practices
Use representative test data
Profile with data that exercises all network paths and edge cases. Diverse inputs ensure accurate FIFO depth measurement.
Start with large profiling depth
Use
profiling_fifo_depth=100_000 for first-time optimization. You can reduce it in later iterations once you understand peak usage.Verify after optimization
Always run additional co-simulation with different test data to ensure optimized FIFOs are sufficient.
Review max_depth.json
Manually inspect the JSON file to understand FIFO usage patterns. Large variations may indicate optimization opportunities elsewhere.
References
Research Paper
H. Borras et al., “Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark” (2022)Detailed analysis and results of FIFO depth optimization on benchmark models.
