Skip to main content
This guide covers building ONNX Runtime GenAI from source on Linux, Windows, and macOS.

Prerequisites

Common Requirements

CMake

Version 3.26 or higher (3.28+ for macOS xcframework support)

Python

Python 3.8 or higher

Git

For cloning the repository

C++ Compiler

GCC 11+, Clang, or MSVC

Platform-Specific Requirements

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install build-essential cmake git python3 python3-pip

# GCC 11 or higher required
gcc --version  # Should be >= 11.0

Optional: Hardware Acceleration

For NVIDIA GPU acceleration:
  • CUDA Toolkit 11.8 or higher
  • cuDNN compatible with your CUDA version
  • Set CUDA_HOME or CUDA_PATH environment variable
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
For AMD GPU acceleration:
  • ROCm 5.0 or higher
  • Compatible AMD GPU
For Windows DirectML acceleration:
  • Windows 10/11
  • DirectX 12 capable GPU
  • No additional installation required

Clone the Repository

git clone https://github.com/microsoft/onnxruntime-genai.git
cd onnxruntime-genai

Build Phases

The build system supports three phases:
  1. Update (--update): Run CMake to generate makefiles
  2. Build (--build): Build all projects
  3. Test (--test): Run all unit tests
Default behavior:
  • Native builds: --update --build --test
  • Cross-compiled builds: --update --build (tests skipped)

Basic Build

CPU-Only Build

python build.py --config Release
This will:
  • Generate build files in build/<platform>/Release
  • Build the C++ library
  • Build the Python wheel
  • Run tests (unless --skip_tests is specified)

Build with CUDA

python build.py --config Release --use_cuda --cuda_home /usr/local/cuda
If CUDA_HOME or CUDA_PATH environment variable is set, you can omit --cuda_home.

Build with DirectML (Windows)

python build.py --config Release --use_dml

Build with ROCm

python build.py --config Release --use_rocm

Build Options

Configuration

--config
string
default:"RelWithDebInfo"
Build configuration: Debug, Release, RelWithDebInfo, or MinSizeRel
--build_dir
path
Custom build directory (default: build/<platform>/<config>)

Build Phases

--update
flag
Run CMake to generate/update makefiles
--build
flag
Build the project
--test
flag
Run tests after building
--clean
flag
Clean build artifacts for the selected configuration
--package
flag
Package the build output

Hardware Acceleration

--use_cuda
flag
Enable CUDA support
--cuda_home
path
Path to CUDA installation (default: $CUDA_HOME or $CUDA_PATH)
--use_rocm
flag
Enable ROCm support for AMD GPUs
--use_dml
flag
Enable DirectML support (Windows only)
--use_trt_rtx
flag
Enable TensorRT-RTX support

Language Bindings

--build_csharp
flag
Build C# API bindings
--build_java
flag
Build Java bindings
--skip_wheel
flag
Skip building the Python wheel

Other Options

--parallel
flag
Enable parallel build
--skip_tests
flag
Skip running tests
--skip_examples
flag
Skip building examples
--use_guidance
flag
Enable guidance support for constrained decoding
--cmake_generator
string
CMake generator (default: “Visual Studio 17 2022” on Windows, “Unix Makefiles” elsewhere)
--cmake_extra_defines
list
Extra CMake definitions (without -D prefix)

Advanced Build Scenarios

Cross-Compilation for Android

python build.py \
  --android \
  --android_abi arm64-v8a \
  --android_api 27 \
  --android_home $ANDROID_HOME \
  --android_ndk_path $ANDROID_NDK_HOME \
  --config Release
--android_abi
string
default:"arm64-v8a"
Android ABI: armeabi-v7a, arm64-v8a, x86, or x86_64
--android_api
integer
default:"27"
Android API Level (27 = Android 8.1)

Cross-Compilation for iOS

python build.py \
  --ios \
  --apple_sysroot iphoneos \
  --osx_arch arm64 \
  --apple_deploy_target 14.0 \
  --config Release
--ios
flag
Build for iOS
--apple_sysroot
string
macOS platform SDK location
--osx_arch
string
Target architecture for iOS/macOS
--apple_deploy_target
string
Minimum target platform version

Build Apple Framework

python build.py \
  --build_apple_framework \
  --config Release

Windows ARM64

python build.py --config Release --arm64

Custom ONNX Runtime

Use a custom ONNX Runtime build:
python build.py \
  --config Release \
  --ort_home /path/to/onnxruntime/build

Build Examples

Development Build with Tests

python build.py \
  --config Debug \
  --parallel \
  --cmake_generator Ninja

Production Build with CUDA

python build.py \
  --config Release \
  --use_cuda \
  --parallel \
  --package

Build Only (Skip Update and Test)

python build.py --build --skip_tests

Clean and Rebuild

python build.py --clean --update --build

Build with Constrained Decoding Support

python build.py \
  --config Release \
  --use_guidance \
  --use_cuda

Installing the Python Package

After building, install the Python wheel:
pip install build/Linux/Release/dist/*.whl

Troubleshooting

  • Ensure CMake version is 3.26 or higher (3.28+ for macOS)
  • Check that all dependencies are installed
  • Delete CMakeCache.txt and the build directory, then retry
  • Verify compiler version (GCC 11+ required)
  • Set CUDA_HOME or CUDA_PATH environment variable
  • Ensure nvcc is in your PATH
  • Verify CUDA Toolkit version is 11.8 or higher
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
nvcc --version
  • Ensure Visual Studio 2022 is installed
  • Run build from “Developer Command Prompt for VS 2022”
  • Check that Windows SDK is installed
  • Try specifying generator explicitly: --cmake_generator "Visual Studio 17 2022"
  • Reduce parallel builds: Remove --parallel flag
  • Use Release or MinSizeRel configuration instead of Debug
  • Close other applications
  • Increase system swap/page file
  • Ensure model files are accessible
  • Check hardware acceleration is properly configured
  • Try running tests individually to isolate issues
  • Use --skip_tests to skip testing during build

Build System Details

Build Directory Structure

onnxruntime-genai/
├── build/
│   ├── Linux/
│   │   ├── Debug/
│   │   └── Release/
│   ├── Windows/
│   │   ├── Debug/
│   │   └── Release/
│   └── macOS/
│       ├── Debug/
│       └── Release/
├── src/
├── test/
├── examples/
└── build.py

CMake Variables

You can pass custom CMake variables:
python build.py \
  --cmake_extra_defines \
    CMAKE_INSTALL_PREFIX=/custom/path \
    BUILD_SHARED_LIBS=ON

Environment Variables

CUDA_HOME
path
Path to CUDA installation
CUDA_PATH
path
Alternative path to CUDA (Windows)
ANDROID_HOME
path
Path to Android SDK
ANDROID_NDK_HOME
path
Path to Android NDK

Next Steps

Quickstart

Run your first inference with the built package

Model Builder

Build optimized ONNX models

Contributing

Contribute to ONNX Runtime GenAI

Examples

Explore code examples

Build docs developers (and LLMs) love