Skip to main content
A ggml_tensor is the fundamental data container in ggml. Tensors are always allocated inside a ggml_context and support up to four dimensions (GGML_MAX_DIMS = 4). They store data in row-major order using stride-based addressing, which allows views, transposes, and permutations without copying.

ggml_type enum

Every tensor has an associated element type:
ValueDescription
GGML_TYPE_F3232-bit float
GGML_TYPE_F1616-bit float (IEEE 754)
GGML_TYPE_BF16Brain float 16
GGML_TYPE_F6464-bit double
GGML_TYPE_I88-bit signed integer
GGML_TYPE_I1616-bit signed integer
GGML_TYPE_I3232-bit signed integer
GGML_TYPE_I6464-bit signed integer
GGML_TYPE_Q4_04-bit quantization, symmetric
GGML_TYPE_Q4_14-bit quantization with scale/min
GGML_TYPE_Q5_05-bit quantization, symmetric
GGML_TYPE_Q5_15-bit quantization with scale/min
GGML_TYPE_Q8_08-bit quantization, symmetric
GGML_TYPE_Q8_18-bit quantization with scale/min
GGML_TYPE_Q2_K2-bit super-block quantization
GGML_TYPE_Q3_K3-bit super-block quantization
GGML_TYPE_Q4_K4-bit super-block quantization
GGML_TYPE_Q5_K5-bit super-block quantization
GGML_TYPE_Q6_K6-bit super-block quantization
GGML_TYPE_Q8_K8-bit super-block quantization
GGML_TYPE_IQ1_S1-bit importance-matrix quantization
GGML_TYPE_IQ1_M1-bit importance-matrix quantization (M)
GGML_TYPE_IQ2_XXS2-bit importance-matrix quantization (XXS)
GGML_TYPE_IQ2_XS2-bit importance-matrix quantization (XS)
GGML_TYPE_IQ2_S2-bit importance-matrix quantization (S)
GGML_TYPE_IQ3_XXS3-bit importance-matrix quantization (XXS)
GGML_TYPE_IQ3_S3-bit importance-matrix quantization (S)
GGML_TYPE_IQ4_NL4-bit importance-matrix quantization (NL)
GGML_TYPE_IQ4_XS4-bit importance-matrix quantization (XS)
GGML_TYPE_TQ1_0Ternary quantization 1-bit
GGML_TYPE_TQ2_0Ternary quantization 2-bit
GGML_TYPE_MXFP4MXFP4 (1 block)
GGML_TYPE_NVFP4NVFP4 (4 blocks, E4M3 scale)

ggml_tensor struct

struct ggml_tensor {
    enum ggml_type         type;
    int64_t                ne[GGML_MAX_DIMS]; // number of elements per dimension
    size_t                 nb[GGML_MAX_DIMS]; // stride in bytes per dimension
    enum ggml_op           op;
    int32_t                flags;
    struct ggml_tensor   * src[GGML_MAX_SRC]; // source (input) tensors for this op
    struct ggml_tensor   * view_src;          // source tensor for views
    size_t                 view_offs;         // byte offset into view_src
    void                 * data;
    char                   name[GGML_MAX_NAME];
};
type
enum ggml_type
Element data type. Determines the size of each element and whether the tensor is quantized.
ne[GGML_MAX_DIMS]
int64_t[4]
Number of elements in each dimension. ne[0] is the innermost (fastest-changing) dimension. Unused dimensions are set to 1.
nb[GGML_MAX_DIMS]
size_t[4]
Stride in bytes for each dimension. nb[0] equals the element size; nb[1] equals the row size in bytes (may include padding). Strides allow non-contiguous memory layouts such as those produced by ggml_transpose and ggml_permute.
op
enum ggml_op
The operation that produced this tensor. GGML_OP_NONE for leaf tensors.
flags
int32_t
Bitmask of ggml_tensor_flag values: GGML_TENSOR_FLAG_INPUT, GGML_TENSOR_FLAG_OUTPUT, GGML_TENSOR_FLAG_PARAM, GGML_TENSOR_FLAG_LOSS, GGML_TENSOR_FLAG_COMPUTE.
src
struct ggml_tensor *[GGML_MAX_SRC]
Pointers to the source tensors that this tensor was computed from. For example, after c = ggml_add(ctx, a, b), c->src[0] == a and c->src[1] == b.
data
void *
Raw pointer to the tensor’s element data. When no_alloc is true on the context, this is NULL until externally assigned.
name
char[GGML_MAX_NAME]
Human-readable label, up to 63 characters. Set with ggml_set_name() or ggml_format_name().

Creating tensors

All creation functions allocate the tensor struct (and, unless no_alloc is set, the backing data) from the context’s memory pool.

ggml_new_tensor

struct ggml_tensor * ggml_new_tensor(
    struct ggml_context * ctx,
    enum   ggml_type      type,
    int                   n_dims,
    const  int64_t      * ne);
General-purpose tensor creation. ne is an array of n_dims element counts.
ctx
struct ggml_context *
required
Context that owns the new tensor.
type
enum ggml_type
required
Element data type.
n_dims
int
required
Number of dimensions (1–4).
ne
const int64_t *
required
Array of element counts, one per dimension. ne[0] is the innermost dimension.

Convenience constructors

These wrap ggml_new_tensor for common dimensionalities:
struct ggml_tensor * ggml_new_tensor_1d(
    struct ggml_context * ctx,
    enum   ggml_type      type,
    int64_t ne0);

struct ggml_tensor * ggml_new_tensor_2d(
    struct ggml_context * ctx,
    enum   ggml_type      type,
    int64_t ne0, int64_t ne1);

struct ggml_tensor * ggml_new_tensor_3d(
    struct ggml_context * ctx,
    enum   ggml_type      type,
    int64_t ne0, int64_t ne1, int64_t ne2);

struct ggml_tensor * ggml_new_tensor_4d(
    struct ggml_context * ctx,
    enum   ggml_type      type,
    int64_t ne0, int64_t ne1, int64_t ne2, int64_t ne3);
// 1D vector of 512 floats
struct ggml_tensor * v = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 512);

// 2D matrix: 64 columns, 32 rows
struct ggml_tensor * m = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 64, 32);

ggml_dup_tensor

struct ggml_tensor * ggml_dup_tensor(
    struct ggml_context * ctx,
    const struct ggml_tensor * src);
Allocates a new tensor with the same shape and type as src, but with its own independent data buffer. The data is not copied.

ggml_view_tensor

struct ggml_tensor * ggml_view_tensor(
    struct ggml_context * ctx,
    struct ggml_tensor  * src);
Creates a new tensor that shares src’s data buffer (same shape, type, strides, and data pointer). Modifying the view’s data modifies src’s data.

Naming

ggml_set_name

struct ggml_tensor * ggml_set_name(
    struct ggml_tensor * tensor,
    const char         * name);
Sets the tensor’s name (truncated to GGML_MAX_NAME - 1 characters). Returns tensor for chaining.

ggml_get_name

const char * ggml_get_name(const struct ggml_tensor * tensor);
Returns the tensor’s name string.

ggml_format_name

struct ggml_tensor * ggml_format_name(
    struct ggml_tensor * tensor,
    const char         * fmt, ...);
Printf-style name assignment. Returns tensor.

Tensor flags

Flags mark tensors for special treatment during graph building and differentiation.

ggml_set_input

void ggml_set_input(struct ggml_tensor * tensor);
Marks the tensor as a graph input (GGML_TENSOR_FLAG_INPUT). Input tensors must be filled with data before ggml_graph_compute is called.

ggml_set_output

void ggml_set_output(struct ggml_tensor * tensor);
Marks the tensor as a graph output (GGML_TENSOR_FLAG_OUTPUT). Results are guaranteed to be written to host-accessible memory after computation.

ggml_set_param

void ggml_set_param(struct ggml_tensor * tensor);
Marks the tensor as a trainable parameter (GGML_TENSOR_FLAG_PARAM). The automatic differentiation engine tracks gradients for param tensors.

Data accessors

These functions are declared in ggml-cpu.h and operate on the CPU representation of tensor data.

Float accessors

// Fill all elements with a scalar value
struct ggml_tensor * ggml_set_f32(struct ggml_tensor * tensor, float value);

// Read/write a single element by flat index
float ggml_get_f32_1d(const struct ggml_tensor * tensor, int i);
void  ggml_set_f32_1d(const struct ggml_tensor * tensor, int i, float value);

// Read/write a single element by 4D coordinates
float ggml_get_f32_nd(const struct ggml_tensor * tensor, int i0, int i1, int i2, int i3);
void  ggml_set_f32_nd(const struct ggml_tensor * tensor, int i0, int i1, int i2, int i3, float value);

Integer accessors

// Fill all elements with a scalar integer value
struct ggml_tensor * ggml_set_i32(struct ggml_tensor * tensor, int32_t value);

// Read/write a single element by flat index
int32_t ggml_get_i32_1d(const struct ggml_tensor * tensor, int i);
void    ggml_set_i32_1d(const struct ggml_tensor * tensor, int i, int32_t value);

// Read/write a single element by 4D coordinates
int32_t ggml_get_i32_nd(const struct ggml_tensor * tensor, int i0, int i1, int i2, int i3);
void    ggml_set_i32_nd(const struct ggml_tensor * tensor, int i0, int i1, int i2, int i3, int32_t value);
For non-F32 or non-I32 tensor types, the accessors perform implicit conversion. Direct stride-based pointer arithmetic is more efficient for bulk access:
// Manual access to element (x, y) of a 2D F32 tensor
float * elem = (float *)((char *)tensor->data + y * tensor->nb[1] + x * tensor->nb[0]);

Type query functions

ggml_nelements

int64_t ggml_nelements(const struct ggml_tensor * tensor);
Returns the total number of elements across all dimensions (ne[0] * ne[1] * ne[2] * ne[3]).

ggml_nbytes

size_t ggml_nbytes(const struct ggml_tensor * tensor);
Returns the total size of the tensor’s data in bytes.

ggml_is_quantized

bool ggml_is_quantized(enum ggml_type type);
Returns true if the type is a quantized format (any Q* or IQ* type).

ggml_is_contiguous

bool ggml_is_contiguous(const struct ggml_tensor * tensor);
Returns true if the tensor elements can be iterated with a flat index — no gaps and no permutation. Equivalent to ggml_is_contiguous_0().

Additional query functions

bool ggml_is_transposed(const struct ggml_tensor * tensor);
bool ggml_is_permuted  (const struct ggml_tensor * tensor);
bool ggml_is_empty     (const struct ggml_tensor * tensor);
bool ggml_is_view      (const struct ggml_tensor * tensor);
bool ggml_is_scalar    (const struct ggml_tensor * tensor);
bool ggml_is_vector    (const struct ggml_tensor * tensor);
bool ggml_is_matrix    (const struct ggml_tensor * tensor);
int  ggml_n_dims       (const struct ggml_tensor * tensor);
int64_t ggml_nrows     (const struct ggml_tensor * tensor);

Context enumeration

Iterate over all tensors in a context:
struct ggml_tensor * ggml_get_first_tensor(const struct ggml_context * ctx);
struct ggml_tensor * ggml_get_next_tensor (
    const struct ggml_context * ctx,
    struct ggml_tensor        * tensor);

// Lookup by name
struct ggml_tensor * ggml_get_tensor(
    struct ggml_context * ctx,
    const char          * name);

Build docs developers (and LLMs) love