Skip to main content

Prerequisites

  • CMake 3.14 or later
  • A C11 / C++17 compiler: GCC, Clang, or MSVC
  • Git
Backend-specific requirements are listed in the relevant sections below.

Basic build

git clone https://github.com/ggml-org/ggml
cd ggml
mkdir build && cd build
cmake ..
cmake --build . --config Release -j 8
This produces the core ggml library and all examples in build/bin/. The default build targets the CPU backend with native ISA optimizations enabled.

Backend builds

Requires the NVIDIA CUDA Toolkit (tested with CUDA 11.x and 12.x) and a compatible NVIDIA GPU. Ensure nvcc is on your PATH.
cmake .. -DGGML_CUDA=ON
cmake --build . --config Release -j 8
Additional CUDA options:
FlagDefaultDescription
GGML_CUDA_FORCE_MMQOFFUse MMQ kernels instead of cuBLAS
GGML_CUDA_FORCE_CUBLASOFFAlways use cuBLAS instead of MMQ kernels
GGML_CUDA_FAONCompile FlashAttention CUDA kernels
GGML_CUDA_GRAPHSOFFEnable CUDA graph capture (llama.cpp)
GGML_CUDA_NO_VMMOFFDisable CUDA virtual memory management
# Example: CUDA with cuBLAS forced on and FlashAttention for all quants
cmake .. -DGGML_CUDA=ON \
         -DGGML_CUDA_FORCE_CUBLAS=ON \
         -DGGML_CUDA_FA_ALL_QUANTS=ON

General CMake options

These options apply to all build configurations.
FlagDefaultDescription
GGML_STATICOFFStatic link libraries
GGML_NATIVEONOptimize for the host CPU (enables AVX2, etc.)
GGML_LTOOFFEnable link-time optimization
GGML_CCACHEONUse ccache if available
GGML_BACKEND_DLOFFBuild backends as dynamic libraries
BUILD_SHARED_LIBSONBuild shared instead of static libraries
GGML_OPENMPONUse OpenMP for CPU multi-threading

CPU instruction set options

When GGML_NATIVE=ON (the default), the compiler detects and enables all supported ISA extensions automatically. Set individual flags only when cross-compiling or targeting a specific baseline.
FlagDescription
GGML_AVXEnable AVX
GGML_AVX2Enable AVX2
GGML_AVX512Enable AVX-512F
GGML_FMAEnable FMA
GGML_F16CEnable F16C
GGML_SSE42Enable SSE 4.2
GGML_BMI2Enable BMI2
Example — build for a fixed AVX2 baseline without native detection:
cmake .. -DGGML_NATIVE=OFF \
         -DGGML_AVX=ON \
         -DGGML_AVX2=ON \
         -DGGML_FMA=ON \
         -DGGML_F16C=ON

Build docs developers (and LLMs) love