ggml_tensor is the fundamental data container in ggml. Tensors are always allocated inside a ggml_context and support up to four dimensions (GGML_MAX_DIMS = 4). They store data in row-major order using stride-based addressing, which allows views, transposes, and permutations without copying.
ggml_type enum
Every tensor has an associated element type:
| Value | Description |
|---|---|
GGML_TYPE_F32 | 32-bit float |
GGML_TYPE_F16 | 16-bit float (IEEE 754) |
GGML_TYPE_BF16 | Brain float 16 |
GGML_TYPE_F64 | 64-bit double |
GGML_TYPE_I8 | 8-bit signed integer |
GGML_TYPE_I16 | 16-bit signed integer |
GGML_TYPE_I32 | 32-bit signed integer |
GGML_TYPE_I64 | 64-bit signed integer |
GGML_TYPE_Q4_0 | 4-bit quantization, symmetric |
GGML_TYPE_Q4_1 | 4-bit quantization with scale/min |
GGML_TYPE_Q5_0 | 5-bit quantization, symmetric |
GGML_TYPE_Q5_1 | 5-bit quantization with scale/min |
GGML_TYPE_Q8_0 | 8-bit quantization, symmetric |
GGML_TYPE_Q8_1 | 8-bit quantization with scale/min |
GGML_TYPE_Q2_K | 2-bit super-block quantization |
GGML_TYPE_Q3_K | 3-bit super-block quantization |
GGML_TYPE_Q4_K | 4-bit super-block quantization |
GGML_TYPE_Q5_K | 5-bit super-block quantization |
GGML_TYPE_Q6_K | 6-bit super-block quantization |
GGML_TYPE_Q8_K | 8-bit super-block quantization |
GGML_TYPE_IQ1_S | 1-bit importance-matrix quantization |
GGML_TYPE_IQ1_M | 1-bit importance-matrix quantization (M) |
GGML_TYPE_IQ2_XXS | 2-bit importance-matrix quantization (XXS) |
GGML_TYPE_IQ2_XS | 2-bit importance-matrix quantization (XS) |
GGML_TYPE_IQ2_S | 2-bit importance-matrix quantization (S) |
GGML_TYPE_IQ3_XXS | 3-bit importance-matrix quantization (XXS) |
GGML_TYPE_IQ3_S | 3-bit importance-matrix quantization (S) |
GGML_TYPE_IQ4_NL | 4-bit importance-matrix quantization (NL) |
GGML_TYPE_IQ4_XS | 4-bit importance-matrix quantization (XS) |
GGML_TYPE_TQ1_0 | Ternary quantization 1-bit |
GGML_TYPE_TQ2_0 | Ternary quantization 2-bit |
GGML_TYPE_MXFP4 | MXFP4 (1 block) |
GGML_TYPE_NVFP4 | NVFP4 (4 blocks, E4M3 scale) |
ggml_tensor struct
Element data type. Determines the size of each element and whether the tensor is quantized.
Number of elements in each dimension.
ne[0] is the innermost (fastest-changing) dimension. Unused dimensions are set to 1.Stride in bytes for each dimension.
nb[0] equals the element size; nb[1] equals the row size in bytes (may include padding). Strides allow non-contiguous memory layouts such as those produced by ggml_transpose and ggml_permute.The operation that produced this tensor.
GGML_OP_NONE for leaf tensors.Bitmask of
ggml_tensor_flag values: GGML_TENSOR_FLAG_INPUT, GGML_TENSOR_FLAG_OUTPUT, GGML_TENSOR_FLAG_PARAM, GGML_TENSOR_FLAG_LOSS, GGML_TENSOR_FLAG_COMPUTE.Pointers to the source tensors that this tensor was computed from. For example, after
c = ggml_add(ctx, a, b), c->src[0] == a and c->src[1] == b.Raw pointer to the tensor’s element data. When
no_alloc is true on the context, this is NULL until externally assigned.Human-readable label, up to 63 characters. Set with
ggml_set_name() or ggml_format_name().Creating tensors
All creation functions allocate the tensor struct (and, unlessno_alloc is set, the backing data) from the context’s memory pool.
ggml_new_tensor
ne is an array of n_dims element counts.
Context that owns the new tensor.
Element data type.
Number of dimensions (1–4).
Array of element counts, one per dimension.
ne[0] is the innermost dimension.Convenience constructors
These wrapggml_new_tensor for common dimensionalities:
ggml_dup_tensor
src, but with its own independent data buffer. The data is not copied.
ggml_view_tensor
src’s data buffer (same shape, type, strides, and data pointer). Modifying the view’s data modifies src’s data.
Naming
ggml_set_name
GGML_MAX_NAME - 1 characters). Returns tensor for chaining.
ggml_get_name
ggml_format_name
tensor.
Tensor flags
Flags mark tensors for special treatment during graph building and differentiation.ggml_set_input
GGML_TENSOR_FLAG_INPUT). Input tensors must be filled with data before ggml_graph_compute is called.
ggml_set_output
GGML_TENSOR_FLAG_OUTPUT). Results are guaranteed to be written to host-accessible memory after computation.
ggml_set_param
GGML_TENSOR_FLAG_PARAM). The automatic differentiation engine tracks gradients for param tensors.
Data accessors
These functions are declared inggml-cpu.h and operate on the CPU representation of tensor data.
Float accessors
Integer accessors
For non-F32 or non-I32 tensor types, the accessors perform implicit conversion. Direct stride-based pointer arithmetic is more efficient for bulk access:
Type query functions
ggml_nelements
ne[0] * ne[1] * ne[2] * ne[3]).
ggml_nbytes
ggml_is_quantized
true if the type is a quantized format (any Q* or IQ* type).
ggml_is_contiguous
true if the tensor elements can be iterated with a flat index — no gaps and no permutation. Equivalent to ggml_is_contiguous_0().
