Core types
| Type | Description |
|---|---|
ggml_backend_t | A live execution stream on a specific device |
ggml_backend_buffer_t | A memory allocation owned by a backend |
ggml_backend_buffer_type_t | A factory for creating buffers of a specific kind |
ggml_backend_dev_t | A discoverable hardware device |
ggml_backend_reg_t | A backend registration entry (groups devices of the same type) |
ggml_backend_sched_t | A multi-backend scheduler |
ggml_backend_t
ggml_backend_t is an opaque handle to an initialized backend instance. It holds an execution stream and is the primary object you pass to graph compute calls.
ggml_backend_buffer_t and ggml_backend_buffer_type_t
Buffers hold the raw memory for tensors. A buffer type (ggml_backend_buffer_type_t) is a descriptor that tells ggml where and how to allocate memory. You get one from a backend and use it to allocate buffers:
ggml_backend_dev_t and device discovery
Every registered backend exposes one or moreggml_backend_dev_t objects. You can enumerate all available devices at runtime:
ggml_backend_dev_type:
| Enum value | Meaning |
|---|---|
GGML_BACKEND_DEVICE_TYPE_CPU | CPU using system memory |
GGML_BACKEND_DEVICE_TYPE_GPU | Discrete GPU with dedicated memory |
GGML_BACKEND_DEVICE_TYPE_IGPU | Integrated GPU using host memory |
GGML_BACKEND_DEVICE_TYPE_ACCEL | Accelerator used alongside the CPU (e.g. BLAS, AMX) |
The backend scheduler
ggml_backend_sched_t lets you run a single computation graph across multiple backends simultaneously. The scheduler:
- Assigns each graph node to the backend that best supports the operation
- Copies tensors between backends automatically when needed
- Allocates compute buffers on each backend
- Prioritises backends with a lower index in the array you supply
GGML_BACKEND_BUFFER_USAGE_WEIGHTS are preferentially assigned to whichever backend owns those weights.
Reserve (optional)
Pass a representative max-size graph to pre-allocate buffers. This avoids allocation at compute time.
Complete example
The following is drawn directly fromexamples/simple/simple-backend.cpp and shows the full lifecycle — backend selection, graph construction, scheduling, and result retrieval.
Available backends
| Backend | Platforms | Hardware | Build flag |
|---|---|---|---|
| CPU | All | x86, ARM, RISC-V, PowerPC | Always available |
| CUDA | Linux, Windows | NVIDIA GPUs | -DGGML_CUDA=ON |
| Metal | macOS 13+ | Apple Silicon, AMD GPUs | -DGGML_METAL=ON |
| Vulkan | Linux, Windows, Android | Cross-vendor GPUs | -DGGML_VULKAN=ON |
| OpenCL | Linux, Windows, Android | AMD, Intel, Qualcomm | -DGGML_OPENCL=ON |
| SYCL | Linux | Intel GPUs, oneAPI | -DGGML_SYCL=ON |
| RPC | All | Remote devices | -DGGML_RPC=ON |
CPU backend
SIMD-optimised execution on x86 and ARM with configurable thread pools.
CUDA backend
NVIDIA GPU acceleration with multi-GPU and split-tensor support.
Metal backend
Native Apple GPU compute for macOS and Apple Silicon.
Vulkan backend
Cross-vendor GPU support for Linux, Windows, and Android.
RPC backend
Distribute computation to remote machines over the network.
