Build from Source - ONNX Runtime GenAI

This guide covers building ONNX Runtime GenAI from source on Linux, Windows, and macOS.

Prerequisites

Common Requirements

CMake

Version 3.26 or higher (3.28+ for macOS xcframework support)

Python

Python 3.8 or higher

Git

For cloning the repository

C++ Compiler

GCC 11+, Clang, or MSVC

Platform-Specific Requirements

Linux
Windows
macOS

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install build-essential cmake git python3 python3-pip

# GCC 11 or higher required
gcc --version  # Should be >= 11.0

# Install Xcode Command Line Tools
xcode-select --install

# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install dependencies
brew install cmake [email protected] git

Optional: Hardware Acceleration

CUDA Support

For NVIDIA GPU acceleration:

CUDA Toolkit 11.8 or higher
cuDNN compatible with your CUDA version
Set CUDA_HOME or CUDA_PATH environment variable

export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH

ROCm Support

For AMD GPU acceleration:

ROCm 5.0 or higher
Compatible AMD GPU

DirectML Support (Windows)

For Windows DirectML acceleration:

Windows 10/11
DirectX 12 capable GPU
No additional installation required

Clone the Repository

git clone https://github.com/microsoft/onnxruntime-genai.git
cd onnxruntime-genai

Build Phases

The build system supports three phases:

Update (--update): Run CMake to generate makefiles
Build (--build): Build all projects
Test (--test): Run all unit tests

Default behavior:

Native builds: --update --build --test
Cross-compiled builds: --update --build (tests skipped)

Basic Build

CPU-Only Build

python build.py --config Release

This will:

Generate build files in build/<platform>/Release
Build the C++ library
Build the Python wheel
Run tests (unless --skip_tests is specified)

Build with CUDA

python build.py --config Release --use_cuda --cuda_home /usr/local/cuda

If CUDA_HOME or CUDA_PATH environment variable is set, you can omit --cuda_home.

Build with DirectML (Windows)

python build.py --config Release --use_dml

Build with ROCm

python build.py --config Release --use_rocm

Build Options

Configuration

--config

string

default:"RelWithDebInfo"

Build configuration: Debug, Release, RelWithDebInfo, or MinSizeRel

--build_dir

path

Custom build directory (default: build/<platform>/<config>)

Build Phases

--update

flag

Run CMake to generate/update makefiles

--build

flag

Build the project

--test

flag

Run tests after building

--clean

flag

Clean build artifacts for the selected configuration

--package

flag

Package the build output

Hardware Acceleration

--use_cuda

flag

Enable CUDA support

--cuda_home

path

Path to CUDA installation (default: $CUDA_HOME or $CUDA_PATH)

--use_rocm

flag

Enable ROCm support for AMD GPUs

--use_dml

flag

Enable DirectML support (Windows only)

--use_trt_rtx

flag

Enable TensorRT-RTX support

Language Bindings

--build_csharp

flag

Build C# API bindings

--build_java

flag

Build Java bindings

--skip_wheel

flag

Skip building the Python wheel

Other Options

--parallel

flag

Enable parallel build

--skip_tests

flag

Skip running tests

--skip_examples

flag

Skip building examples

--use_guidance

flag

Enable guidance support for constrained decoding

--cmake_generator

string

CMake generator (default: “Visual Studio 17 2022” on Windows, “Unix Makefiles” elsewhere)

--cmake_extra_defines

list

Extra CMake definitions (without -D prefix)

Advanced Build Scenarios

Cross-Compilation for Android

python build.py \
  --android \
  --android_abi arm64-v8a \
  --android_api 27 \
  --android_home $ANDROID_HOME \
  --android_ndk_path $ANDROID_NDK_HOME \
  --config Release

--android_abi

string

default:"arm64-v8a"

Android ABI: armeabi-v7a, arm64-v8a, x86, or x86_64

--android_api

integer

default:"27"

Android API Level (27 = Android 8.1)

Cross-Compilation for iOS

python build.py \
  --ios \
  --apple_sysroot iphoneos \
  --osx_arch arm64 \
  --apple_deploy_target 14.0 \
  --config Release

--ios

flag

Build for iOS

--apple_sysroot

string

macOS platform SDK location

--osx_arch

string

Target architecture for iOS/macOS

--apple_deploy_target

string

Minimum target platform version

Build Apple Framework

python build.py \
  --build_apple_framework \
  --config Release

Windows ARM64

python build.py --config Release --arm64

Custom ONNX Runtime

Use a custom ONNX Runtime build:

python build.py \
  --config Release \
  --ort_home /path/to/onnxruntime/build

Build Examples

Development Build with Tests

python build.py \
  --config Debug \
  --parallel \
  --cmake_generator Ninja

Production Build with CUDA

python build.py \
  --config Release \
  --use_cuda \
  --parallel \
  --package

Build Only (Skip Update and Test)

python build.py --build --skip_tests

Clean and Rebuild

python build.py --clean --update --build

Build with Constrained Decoding Support

python build.py \
  --config Release \
  --use_guidance \
  --use_cuda

Installing the Python Package

After building, install the Python wheel:

pip install build/Linux/Release/dist/*.whl

Troubleshooting

CMake Generation Fails

Ensure CMake version is 3.26 or higher (3.28+ for macOS)
Check that all dependencies are installed
Delete CMakeCache.txt and the build directory, then retry
Verify compiler version (GCC 11+ required)

CUDA Not Found

Set CUDA_HOME or CUDA_PATH environment variable
Ensure nvcc is in your PATH
Verify CUDA Toolkit version is 11.8 or higher

export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
nvcc --version

Build Errors on Windows

Ensure Visual Studio 2022 is installed
Run build from “Developer Command Prompt for VS 2022”
Check that Windows SDK is installed
Try specifying generator explicitly: --cmake_generator "Visual Studio 17 2022"

Out of Memory During Build

Reduce parallel builds: Remove --parallel flag
Use Release or MinSizeRel configuration instead of Debug
Close other applications
Increase system swap/page file

Test Failures

Ensure model files are accessible
Check hardware acceleration is properly configured
Try running tests individually to isolate issues
Use --skip_tests to skip testing during build

Build System Details

Build Directory Structure

onnxruntime-genai/
├── build/
│   ├── Linux/
│   │   ├── Debug/
│   │   └── Release/
│   ├── Windows/
│   │   ├── Debug/
│   │   └── Release/
│   └── macOS/
│       ├── Debug/
│       └── Release/
├── src/
├── test/
├── examples/
└── build.py

CMake Variables

You can pass custom CMake variables:

python build.py \
  --cmake_extra_defines \
    CMAKE_INSTALL_PREFIX=/custom/path \
    BUILD_SHARED_LIBS=ON

Environment Variables

CUDA_HOME

path

Path to CUDA installation

CUDA_PATH

path

Alternative path to CUDA (Windows)

ANDROID_HOME

path

Path to Android SDK

ANDROID_NDK_HOME

path

Path to Android NDK

Next Steps

Quickstart

Run your first inference with the built package

Model Builder

Build optimized ONNX models

Contributing

Contribute to ONNX Runtime GenAI

Examples

Explore code examples

Get Started

Core Concepts

Guides

Multi-Modal

Hardware Acceleration

​Prerequisites

​Common Requirements

CMake

Python

Git

C++ Compiler

​Platform-Specific Requirements

​Optional: Hardware Acceleration

​Clone the Repository

​Build Phases

​Basic Build

​CPU-Only Build

​Build with CUDA

​Build with DirectML (Windows)

​Build with ROCm

​Build Options

​Configuration

​Build Phases

​Hardware Acceleration

​Language Bindings

​Other Options

​Advanced Build Scenarios

​Cross-Compilation for Android

​Cross-Compilation for iOS

​Build Apple Framework

​Windows ARM64

​Custom ONNX Runtime

​Build Examples

​Development Build with Tests

​Production Build with CUDA

​Build Only (Skip Update and Test)

​Clean and Rebuild

​Build with Constrained Decoding Support

​Installing the Python Package

​Troubleshooting

​Build System Details

​Build Directory Structure

​CMake Variables

​Environment Variables

​Next Steps

Quickstart

Model Builder

Contributing

Examples

Build docs developers (and LLMs) love

Prerequisites

Common Requirements

Platform-Specific Requirements

Optional: Hardware Acceleration

Clone the Repository

Build Phases

Basic Build

CPU-Only Build

Build with CUDA

Build with DirectML (Windows)

Build with ROCm

Build Options

Configuration

Build Phases

Hardware Acceleration

Language Bindings

Other Options

Advanced Build Scenarios

Cross-Compilation for Android

Cross-Compilation for iOS

Build Apple Framework

Windows ARM64

Custom ONNX Runtime

Build Examples

Development Build with Tests

Production Build with CUDA

Build Only (Skip Update and Test)

Clean and Rebuild

Build with Constrained Decoding Support

Installing the Python Package

Troubleshooting

Build System Details

Build Directory Structure

CMake Variables

Environment Variables

Next Steps