ggml_context is an arena allocator that owns all tensors, graphs, and work buffers created within it. You allocate a fixed-size memory pool at initialization time, then every subsequent allocation is a bump-pointer advance into that pool — no per-object malloc or free calls during computation.
All tensors created with the same context share its memory pool. Plan your memory budget carefully, or use
ggml_used_mem() after graph construction to measure actual usage.ggml_init_params
Passed to ggml_init() to configure the context.
Size of the memory pool in bytes. All tensor metadata and (unless
no_alloc is true) tensor data is allocated from this pool.Pointer to a caller-owned buffer of at least
mem_size bytes. When NULL, ggml internally allocates the buffer with malloc.When
true, tensor structs are allocated from the pool but tensor data pointers are left NULL. Use this when you plan to bind external data buffers (e.g., from a GPU backend) after context creation.Lifecycle functions
ggml_init
NULL on failure. If params.mem_buffer is NULL, the library allocates the backing memory internally and owns it.
ggml_reset
ggml_free / ggml_init.
ggml_free
Memory inspection
ggml_used_mem
mem_size after a trial run.
ggml_get_mem_buffer
ggml_get_mem_size
mem_size at initialization).
ggml_get_max_tensor_size
No-alloc control
Theno_alloc flag can be toggled after context creation, which is useful when switching between planning and execution phases.
ggml_get_no_alloc
no_alloc flag.
ggml_set_no_alloc
no_alloc flag. When true, subsequent ggml_new_tensor_* calls allocate the tensor struct but leave tensor->data as NULL.
Arena allocator model
ggml uses a simple bump-pointer (arena) allocator internally:ggml_init()acquires one contiguous block of memory of exactlymem_sizebytes.- Every
ggml_new_tensor_*,ggml_new_graph, and internal work buffer call advances the pool cursor forward. - There is no per-object deallocation. All memory is released at once by
ggml_free()or reclaimed byggml_reset().
Overhead helpers
Use these constants to pre-calculate how much pool space ggml’s internal bookkeeping structures consume before your tensor data:mem_size budget when estimating how many tensors will fit.