Skip to main content
The CPU backend is ggml’s built-in execution target. It requires no external dependencies, works on every supported platform, and is always available as a fallback when no GPU backend is present.

Initialization

#include "ggml-cpu.h"

ggml_backend_t backend = ggml_backend_cpu_init();
if (!backend) {
    fprintf(stderr, "failed to initialize CPU backend\n");
    return 1;
}
You can also use the generic backend selector, which returns the CPU backend when no GPU is found:
// Returns the best GPU, or CPU if none is available
ggml_backend_t backend = ggml_backend_init_best();

// Always returns the CPU backend
ggml_backend_t cpu = ggml_backend_init_by_type(GGML_BACKEND_DEVICE_TYPE_CPU, NULL);
Call ggml_backend_load_all() before using ggml_backend_init_best() or ggml_backend_init_by_type() so that all compiled-in backends are registered.

Thread configuration

The CPU backend parallelises operations across threads. You control the thread count after initialization:
// Set the number of threads for graph compute
ggml_backend_cpu_set_n_threads(backend, 8);

Custom thread pool

For finer control — including thread affinity and NUMA-awareness — create a ggml_threadpool and attach it:
#include "ggml-cpu.h"

struct ggml_threadpool_params tp_params = ggml_threadpool_params_default(8);
struct ggml_threadpool * pool = ggml_threadpool_new(&tp_params);

ggml_backend_cpu_set_threadpool(backend, pool);

// When done:
ggml_threadpool_free(pool);
Thread pool management functions:
FunctionDescription
ggml_threadpool_new(params)Create a thread pool with the given parameters
ggml_threadpool_free(pool)Destroy the thread pool
ggml_threadpool_get_n_threads(pool)Query the thread count
ggml_threadpool_pause(pool)Suspend worker threads
ggml_threadpool_resume(pool)Resume suspended threads

NUMA support

On systems with multiple NUMA nodes, initialise ggml’s NUMA support before creating backends:
// Choose a strategy appropriate for your system
ggml_numa_init(GGML_NUMA_STRATEGY_DISTRIBUTE);
StrategyDescription
GGML_NUMA_STRATEGY_DISABLEDNo NUMA awareness (default)
GGML_NUMA_STRATEGY_DISTRIBUTEDistribute threads across nodes
GGML_NUMA_STRATEGY_ISOLATEPin all threads to one node
GGML_NUMA_STRATEGY_NUMACTLHonour numactl binding from the shell
GGML_NUMA_STRATEGY_MIRRORMirror allocation across nodes

SIMD optimisations

ggml detects CPU features at runtime and selects the most capable implementation for each operation. You can query which extensions are available:
// Returns 1 if the CPU supports the extension, 0 otherwise
ggml_cpu_has_avx()        // AVX
ggml_cpu_has_avx2()       // AVX2
ggml_cpu_has_avx512()     // AVX-512F
ggml_cpu_has_avx512_vnni()// AVX-512 VNNI
ggml_cpu_has_avx512_bf16()// AVX-512 BF16
ggml_cpu_has_avx_vnni()   // AVX-VNNI
ggml_cpu_has_fma()        // FMA3
ggml_cpu_has_f16c()       // F16C (CVT16)
ggml_cpu_has_amx_int8()   // Intel AMX INT8
ggml_cpu_has_bmi2()       // BMI2
You do not need to call these functions to get SIMD acceleration — ggml selects the best path automatically. Use them only if you need to log or assert specific capabilities.

Abort callback

You can register a callback that the CPU backend will call periodically during graph compute. Return true to abort execution:
bool my_abort(void * data) {
    return should_cancel; // return true to stop computation
}

ggml_backend_cpu_set_abort_callback(backend, my_abort, NULL);

Reference implementations

For debugging or correctness testing, force the backend to use unoptimised scalar code:
ggml_backend_cpu_set_use_ref(backend, true);

Build configuration

The CPU backend is compiled into ggml unconditionally. No additional CMake flags are required. SIMD paths are enabled automatically when the target compiler supports them.
cmake -B build
cmake --build build
To target a specific architecture on x86:
# Enable AVX2 and FMA explicitly
target_compile_options(ggml PRIVATE -mavx2 -mfma)

API summary

FunctionDescription
ggml_backend_cpu_init()Create a CPU backend instance
ggml_backend_is_cpu(backend)Check whether a backend is the CPU backend
ggml_backend_cpu_set_n_threads(backend, n)Set the thread count
ggml_backend_cpu_set_threadpool(backend, pool)Attach a custom thread pool
ggml_backend_cpu_set_abort_callback(backend, cb, data)Register an abort callback
ggml_backend_cpu_set_use_ref(backend, use_ref)Force reference (scalar) implementations
ggml_backend_cpu_reg()Return the CPU backend registry entry

Build docs developers (and LLMs) love