Skip to main content

Overview

llama.cpp provides extensive CMake options to customize your build. Options are specified with -D flag:
cmake -B build -DOPTION_NAME=VALUE

General Options

Build Type

OptionValuesDefaultDescription
CMAKE_BUILD_TYPEDebug, Release, RelWithDebInfo, MinSizeRelReleaseBuild configuration type
BUILD_SHARED_LIBSON, OFFPlatform-dependentBuild shared libraries instead of static
GGML_STATICON, OFFOFFStatic link libraries
Debug build
cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build
Static build
cmake -B build -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release

Optimization Options

OptionValuesDefaultDescription
GGML_NATIVEON, OFFONOptimize for current system CPU
GGML_LTOON, OFFOFFEnable link-time optimization
GGML_CCACHEON, OFFONUse ccache if available
GGML_NATIVE=ON optimizes for your current CPU but may not work on other systems.
Portable build (no native optimization)
cmake -B build -DGGML_NATIVE=OFF
cmake --build build --config Release

Build Targets

OptionValuesDefaultDescription
LLAMA_BUILD_TESTSON, OFFON (standalone)Build test suite
LLAMA_BUILD_TOOLSON, OFFON (standalone)Build command-line tools
LLAMA_BUILD_EXAMPLESON, OFFON (standalone)Build example programs
LLAMA_BUILD_SERVERON, OFFON (standalone)Build HTTP server
LLAMA_BUILD_COMMONON, OFFON (standalone)Build common utilities
Minimal build (library only)
cmake -B build \
    -DLLAMA_BUILD_TESTS=OFF \
    -DLLAMA_BUILD_TOOLS=OFF \
    -DLLAMA_BUILD_EXAMPLES=OFF \
    -DLLAMA_BUILD_SERVER=OFF
cmake --build build --config Release

GPU Backend Options

CUDA (NVIDIA)

OptionValuesDefaultDescription
GGML_CUDAON, OFFOFFEnable CUDA support
GGML_CUDA_FORCE_MMQON, OFFOFFForce MMQ kernels instead of cuBLAS
GGML_CUDA_FORCE_CUBLASON, OFFOFFForce cuBLAS instead of custom kernels
GGML_CUDA_GRAPHSON, OFFONEnable CUDA graphs
GGML_CUDA_FAON, OFFONCompile FlashAttention kernels
GGML_CUDA_FA_ALL_QUANTSON, OFFOFFCompile all quants for FlashAttention
GGML_CUDA_PEER_MAX_BATCH_SIZEInteger128Max batch size for peer access
GGML_CUDA_NO_PEER_COPYON, OFFOFFDisable peer-to-peer copies
GGML_CUDA_NO_VMMON, OFFOFFDisable CUDA Virtual Memory Management
CMAKE_CUDA_ARCHITECTURESStringAutoSpecify compute capabilities (e.g., “86;89”)
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
GGML_CUDA_FA_ALL_QUANTS=ON significantly increases compilation time and binary size.

HIP (AMD)

OptionValuesDefaultDescription
GGML_HIPON, OFFOFFEnable HIP support
GGML_HIP_GRAPHSON, OFFOFFEnable HIP graphs (experimental)
GGML_HIP_NO_VMMON, OFFONDisable HIP Virtual Memory Management
GGML_HIP_ROCWMMA_FATTNON, OFFOFFEnable rocWMMA for FlashAttention
GGML_HIP_MMQ_MFMAON, OFFONEnable MFMA MMA for CDNA in MMQ
GPU_TARGETSStringAllSpecify GPU architecture (e.g., “gfx1030”)
Linux HIP build
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
    cmake -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1030
cmake --build build --config Release
Windows HIP build
set PATH=%HIP_PATH%\bin;%PATH%
cmake -B build -G Ninja -DGGML_HIP=ON -DGPU_TARGETS=gfx1100 \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
cmake --build build

Metal (Apple)

OptionValuesDefaultDescription
GGML_METALON, OFFON (macOS)Enable Metal GPU support
GGML_METAL_NDEBUGON, OFFOFFDisable Metal debugging
GGML_METAL_SHADER_DEBUGON, OFFOFFCompile Metal with -fno-fast-math
GGML_METAL_EMBED_LIBRARYON, OFFONEmbed Metal library in binary
GGML_METAL_MACOSX_VERSION_MINStringEmptyMetal minimum macOS version
Disable Metal
cmake -B build -DGGML_METAL=OFF
cmake --build build --config Release

Vulkan

OptionValuesDefaultDescription
GGML_VULKANON, OFFOFFEnable Vulkan support
GGML_VULKAN_CHECK_RESULTSON, OFFOFFRun Vulkan operation checks
GGML_VULKAN_DEBUGON, OFFOFFEnable Vulkan debug output
GGML_VULKAN_MEMORY_DEBUGON, OFFOFFEnable Vulkan memory debug
GGML_VULKAN_VALIDATEON, OFFOFFEnable Vulkan validation layers
Vulkan with validation
cmake -B build -DGGML_VULKAN=ON -DGGML_VULKAN_VALIDATE=ON
cmake --build build --config Release

SYCL (Intel GPU)

OptionValuesDefaultDescription
GGML_SYCLON, OFFOFFEnable SYCL support
GGML_SYCL_F16ON, OFFOFFUse 16-bit floats for calculations
GGML_SYCL_GRAPHON, OFFONEnable graphs in SYCL backend
GGML_SYCL_DNNON, OFFONEnable oneDNN in SYCL backend
GGML_SYCL_TARGETStringINTELSYCL target device
cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_F16=ON
cmake --build build --config Release

CPU Backend Options

BLAS Libraries

OptionValuesDefaultDescription
GGML_BLASON, OFFOFFEnable BLAS support
GGML_BLAS_VENDORStringGenericBLAS vendor (OpenBLAS, Intel10_64lp, Apple, etc.)
GGML_ACCELERATEON, OFFON (macOS)Use Accelerate framework
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
cmake --build build --config Release

CPU Instruction Sets

OptionValuesDefaultDescription
GGML_SSE42ON, OFFON*Enable SSE 4.2
GGML_AVXON, OFFON*Enable AVX
GGML_AVX2ON, OFFON*Enable AVX2
GGML_AVX_VNNION, OFFOFFEnable AVX-VNNI
GGML_AVX512ON, OFFOFFEnable AVX-512F
GGML_AVX512_VBMION, OFFOFFEnable AVX-512 VBMI
GGML_AVX512_VNNION, OFFOFFEnable AVX-512 VNNI
GGML_AVX512_BF16ON, OFFOFFEnable AVX-512 BF16
GGML_FMAON, OFFON*Enable FMA (non-MSVC)
GGML_F16CON, OFFON*Enable F16C (non-MSVC)
GGML_BMI2ON, OFFON*Enable BMI2
* Default is ON unless GGML_NATIVE=OFF
These options override automatic detection. Only use if you know your target CPU capabilities.
Enable AVX-512
cmake -B build -DGGML_AVX512=ON -DGGML_AVX512_VNNI=ON
cmake --build build --config Release

ARM CPU Options

OptionValuesDefaultDescription
GGML_CPU_KLEIDIAION, OFFOFFUse KleidiAI optimized kernels
GGML_CPU_ARM_ARCHStringEmptyCPU architecture for ARM
GGML_RVVON, OFFONEnable RISC-V vector extension
Arm KleidiAI
cmake -B build -DGGML_CPU_KLEIDIAI=ON
cmake --build build --config Release

Special Purpose Backends

CANN (Ascend NPU)

cmake -B build -DGGML_CANN=ON
cmake --build build --config Release

ZenDNN (AMD EPYC)

OptionValuesDefaultDescription
GGML_ZENDNNON, OFFOFFEnable ZenDNN support
ZENDNN_ROOTPathEmptyPath to ZenDNN installation
cmake -B build -DGGML_ZENDNN=ON
cmake --build build --config Release

OpenCL (Qualcomm Adreno)

OptionValuesDefaultDescription
GGML_OPENCLON, OFFOFFEnable OpenCL support
GGML_OPENCL_PROFILINGON, OFFOFFEnable OpenCL profiling
GGML_OPENCL_EMBED_KERNELSON, OFFONEmbed kernels in binary
GGML_OPENCL_USE_ADRENO_KERNELSON, OFFONUse optimized Adreno kernels
cmake -B build -DGGML_OPENCL=ON
cmake --build build --config Release

RPC Backend

cmake -B build -DGGML_RPC=ON
cmake --build build --config Release

Advanced Options

Threading

OptionValuesDefaultDescription
GGML_OPENMPON, OFFONEnable OpenMP for threading
Disable OpenMP
cmake -B build -DGGML_OPENMP=OFF
cmake --build build --config Release

Debugging & Diagnostics

OptionValuesDefaultDescription
LLAMA_ALL_WARNINGSON, OFFONEnable all compiler warnings
LLAMA_FATAL_WARNINGSON, OFFOFFTreat warnings as errors
LLAMA_SANITIZE_THREADON, OFFOFFEnable thread sanitizer
LLAMA_SANITIZE_ADDRESSON, OFFOFFEnable address sanitizer
LLAMA_SANITIZE_UNDEFINEDON, OFFOFFEnable undefined behavior sanitizer
Build with sanitizers
cmake -B build -DCMAKE_BUILD_TYPE=Debug \
    -DLLAMA_SANITIZE_ADDRESS=ON \
    -DLLAMA_SANITIZE_UNDEFINED=ON
cmake --build build

Backend Dynamic Loading

OptionValuesDefaultDescription
GGML_BACKEND_DLON, OFFOFFBuild backends as dynamic libraries
GGML_BACKEND_DIRPathEmptyDirectory to load backends from
GGML_BACKEND_DL requires BUILD_SHARED_LIBS=ON.
Dynamic backend loading
cmake -B build -DBUILD_SHARED_LIBS=ON -DGGML_BACKEND_DL=ON
cmake --build build --config Release

HTTPS/SSL Support

OptionValuesDefaultDescription
LLAMA_OPENSSLON, OFFONUse OpenSSL for HTTPS support
Disable OpenSSL
cmake -B build -DLLAMA_OPENSSL=OFF
cmake --build build --config Release

Common Build Configurations

Development Build

Fast compilation, full debug info:
cmake -B build -DCMAKE_BUILD_TYPE=Debug \
    -DLLAMA_ALL_WARNINGS=ON
cmake --build build

Production Build

Optimized for performance:
cmake -B build -DCMAKE_BUILD_TYPE=Release \
    -DGGML_NATIVE=ON \
    -DGGML_LTO=ON \
    -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release

Multi-GPU Build

CUDA + Vulkan for flexibility:
cmake -B build -DGGML_CUDA=ON -DGGML_VULKAN=ON \
    -DGGML_CUDA_GRAPHS=ON
cmake --build build --config Release

Minimal Binary Size

Library only, static linking:
cmake -B build -DCMAKE_BUILD_TYPE=MinSizeRel \
    -DBUILD_SHARED_LIBS=OFF \
    -DLLAMA_BUILD_TESTS=OFF \
    -DLLAMA_BUILD_TOOLS=OFF \
    -DLLAMA_BUILD_EXAMPLES=OFF \
    -DLLAMA_BUILD_SERVER=OFF
cmake --build build --config MinSizeRel

Portable Build

Runs on any x86_64 system:
cmake -B build -DGGML_NATIVE=OFF \
    -DGGML_AVX=ON \
    -DGGML_AVX2=ON \
    -DGGML_FMA=ON \
    -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release

Option Discovery

To see all available CMake options:
cmake -B build -LH
To see current configuration:
cmake -B build -LA
For GUI-based configuration (requires cmake-gui):
cmake-gui -B build