Skip to main content
The Vulkan backend runs tensor operations on any GPU with a Vulkan 1.2+ driver. It covers NVIDIA, AMD, Intel, Qualcomm, and ARM Mali GPUs on Linux, Windows, and Android — making it the most portable GPU backend.

Requirements

  • Vulkan 1.2 capable GPU and driver
  • Vulkan SDK (for building from source)
  • Linux, Windows, or Android

Build

cmake -B build -DGGML_VULKAN=ON
cmake --build build
Useful CMake options:
OptionDefaultDescription
GGML_VULKAN=ONOFFEnable the Vulkan backend
GGML_VULKAN_DEBUG=ONOFFEnable Vulkan validation layers and debug output
GGML_VULKAN_MEMORY_DEBUG=ONOFFPrint memory allocation details
GGML_VULKAN_SHADER_DEBUG_INFO=ONOFFInclude shader debug information
Install the Vulkan SDK from lunarg.com/vulkan-sdk on Linux and Windows. On Android, the Vulkan driver is provided by the device vendor.

Initialization

#include "ggml-vulkan.h"

// Initialize on Vulkan device 0
ggml_backend_t backend = ggml_backend_vk_init(0);
if (!backend) {
    fprintf(stderr, "failed to initialize Vulkan backend\n");
    return 1;
}

Device enumeration

Enumerate all available Vulkan devices before selecting one:
int n = ggml_backend_vk_get_device_count();
for (int i = 0; i < n; i++) {
    char desc[256];
    size_t free, total;
    ggml_backend_vk_get_device_description(i, desc, sizeof(desc));
    ggml_backend_vk_get_device_memory(i, &free, &total);
    printf("device %zu: %s%.1f / %.1f GB free\n",
           (size_t)i, desc, free / 1e9, total / 1e9);
}

// Use the first discrete GPU
ggml_backend_t backend = ggml_backend_vk_init(0);
The maximum number of Vulkan devices is GGML_VK_MAX_DEVICES (16).

Buffer types

The Vulkan backend provides two buffer types:
// GPU-local buffer (device memory)
ggml_backend_buffer_type_t buft = ggml_backend_vk_buffer_type(device_index);

// Pinned host buffer for faster CPU↔GPU transfers
ggml_backend_buffer_type_t host_buft = ggml_backend_vk_host_buffer_type();
Use the host buffer type for input tensors that change every forward pass. Pinned memory avoids redundant copies through the Vulkan transfer queue.

Using Vulkan with the scheduler

Combine the Vulkan backend with a CPU fallback:
ggml_backend_t vk  = ggml_backend_vk_init(0);
ggml_backend_t cpu = ggml_backend_cpu_init();

ggml_backend_t backends[2] = { vk, cpu };
ggml_backend_sched_t sched = ggml_backend_sched_new(
    backends, NULL, 2, GGML_DEFAULT_GRAPH_SIZE, false, true
);
The scheduler assigns each graph node to the backend that supports it. Operations not yet implemented in Vulkan fall back to the CPU automatically.

API summary

FunctionDescription
ggml_backend_vk_init(dev_num)Create a Vulkan backend for the given device index
ggml_backend_is_vk(backend)Check whether a backend is the Vulkan backend
ggml_backend_vk_get_device_count()Number of available Vulkan devices
ggml_backend_vk_get_device_description(dev, buf, size)Human-readable device name
ggml_backend_vk_get_device_memory(dev, free, total)Available and total device memory
ggml_backend_vk_buffer_type(dev_num)Device-local buffer type
ggml_backend_vk_host_buffer_type()Pinned host memory buffer type
ggml_backend_vk_reg()Return the Vulkan backend registry entry

Build docs developers (and LLMs) love