Tensors

Every computation in ggml operates on tensors. A tensor is a typed, multi-dimensional array backed by a contiguous (or strided) memory region. ggml supports up to 4 dimensions and a wide range of numeric types, from 32-bit floats down to sub-4-bit quantized formats.

The `ggml_tensor` struct

The core data structure is defined in ggml.h:

struct ggml_tensor {
    enum ggml_type type;

    struct ggml_backend_buffer * buffer;

    int64_t ne[GGML_MAX_DIMS]; // number of elements per dimension
    size_t  nb[GGML_MAX_DIMS]; // stride in bytes per dimension:
                               // nb[0] = ggml_type_size(type)
                               // nb[1] = nb[0] * (ne[0] / ggml_blck_size(type)) + padding
                               // nb[i] = nb[i-1] * ne[i-1]

    enum ggml_op op;           // the operation that produced this tensor
    int32_t      flags;        // GGML_TENSOR_FLAG_INPUT, _OUTPUT, _PARAM, ...

    struct ggml_tensor * src[GGML_MAX_SRC]; // source (input) tensors

    struct ggml_tensor * view_src;  // non-NULL when this tensor is a view
    size_t               view_offs;

    void * data;               // raw data pointer
    char   name[GGML_MAX_NAME];
};

GGML_MAX_DIMS is 4, so every tensor is at most 4-dimensional. ne[0] is the innermost (fastest-varying) dimension — the number of columns in a matrix.

Data types

ggml_type covers floating-point formats, integer formats, and a large family of quantized types:

enum ggml_type {
    // Full-precision floats
    GGML_TYPE_F32  = 0,
    GGML_TYPE_F16  = 1,
    GGML_TYPE_BF16 = 30,
    GGML_TYPE_F64  = 28,

    // Integer types
    GGML_TYPE_I8   = 24,
    GGML_TYPE_I16  = 25,
    GGML_TYPE_I32  = 26,
    GGML_TYPE_I64  = 27,

    // Legacy k-quants (block-quantized)
    GGML_TYPE_Q4_0 = 2,  GGML_TYPE_Q4_1 = 3,
    GGML_TYPE_Q5_0 = 6,  GGML_TYPE_Q5_1 = 7,
    GGML_TYPE_Q8_0 = 8,

    // K-quants
    GGML_TYPE_Q2_K = 10, GGML_TYPE_Q3_K = 11,
    GGML_TYPE_Q4_K = 12, GGML_TYPE_Q5_K = 13,
    GGML_TYPE_Q6_K = 14, GGML_TYPE_Q8_K = 15,

    // i-quants
    GGML_TYPE_IQ1_S   = 19, GGML_TYPE_IQ1_M   = 29,
    GGML_TYPE_IQ2_XXS = 16, GGML_TYPE_IQ2_XS  = 17,
    GGML_TYPE_IQ2_S   = 22, GGML_TYPE_IQ3_XXS = 18,
    GGML_TYPE_IQ3_S   = 21, GGML_TYPE_IQ4_NL  = 20,
    GGML_TYPE_IQ4_XS  = 23,

    GGML_TYPE_COUNT = 41,
};

Use ggml_type_name(type) to get a human-readable string, and ggml_is_quantized(type) to test whether a type uses block quantization.

Creating tensors

Tensors are always allocated from a ggml_context. The context owns a fixed-size memory buffer; every tensor carves out space from it.

// 1-D vector of 128 floats
struct ggml_tensor * v = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 128);

// 2-D matrix: 4 columns × 3 rows (ne[0]=4, ne[1]=3)
struct ggml_tensor * m = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 3);

// 3-D tensor
struct ggml_tensor * t = ggml_new_tensor_3d(ctx, GGML_TYPE_F32, 64, 32, 8);

// 4-D tensor
struct ggml_tensor * q = ggml_new_tensor_4d(ctx, GGML_TYPE_F32, 64, 32, 8, 2);

Dimension ordering follows column-major convention: ne[0] is the number of elements in the fastest-varying (innermost) dimension. For a matrix, ne[0] is columns and ne[1] is rows.

You can also use the generic allocator when the rank is determined at runtime:

int64_t dims[] = {64, 32, 8, 2};
struct ggml_tensor * t = ggml_new_tensor(ctx, GGML_TYPE_F32, 4, dims);

Reading and writing values

For CPU-resident tensors the scalar helpers from ggml-cpu.h are the safest way to access individual elements:

#include "ggml-cpu.h"

// Fill every element with a constant
ggml_set_f32(tensor, 1.0f);

// Read / write by flat index
float val = ggml_get_f32_1d(tensor, 42);
ggml_set_f32_1d(tensor, 42, 3.14f);

// Read / write by n-d coordinates
float v = ggml_get_f32_nd(tensor, col, row, slice, batch);
ggml_set_f32_nd(tensor, col, row, slice, batch, v);

For bulk initialization you can write directly through tensor->data using the stride fields:

const int nx = 2;
const int ny = 3;

struct ggml_tensor * a = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, nx, ny);

for (int y = 0; y < ny; y++) {
    for (int x = 0; x < nx; x++) {
        *(float *)((char *)a->data + y*a->nb[1] + x*a->nb[0]) = x + y;
    }
}

Or copy an existing array in one call:

float src[rows_A * cols_A] = { 2, 8, 5, 1, 4, 2, 8, 6 };

struct ggml_tensor * a = ggml_new_tensor_2d(
    model.ctx, GGML_TYPE_F32, cols_A, rows_A);

memcpy(a->data, src, ggml_nbytes(a));

Tensor metadata

Each tensor carries metadata that describes how it was produced:

Field	Description
`type`	Element type (F32, Q4_K, …)
`ne[4]`	Number of elements per dimension
`nb[4]`	Stride in bytes per dimension
`op`	Operation that produced this tensor (`GGML_OP_ADD`, etc.)
`src[GGML_MAX_SRC]`	Pointers to the input tensors of `op`
`data`	Raw data pointer
`name[GGML_MAX_NAME]`	Optional debug name
`flags`	Input / output / param / loss flags

The src array lets you walk the computation graph upward:

struct ggml_tensor * c = ggml_add(ctx, a, b);

assert(c->src[0] == a);
assert(c->src[1] == b);

Name a tensor for easier debugging:

ggml_set_name(tensor, "weights");
// or with printf-style formatting:
ggml_format_name(tensor, "layer_%d_weight", layer_idx);

Contiguous vs strided tensors

ggml supports non-contiguous tensors produced by operations such as ggml_transpose, ggml_permute, and ggml_view_*. A tensor is contiguous when its elements are laid out in memory with no gaps and in the expected order.

bool ok = ggml_is_contiguous(tensor);

Related predicates:

bool ggml_is_transposed(const struct ggml_tensor * tensor);
bool ggml_is_permuted  (const struct ggml_tensor * tensor);
bool ggml_is_contiguous_1(const struct ggml_tensor * tensor); // contiguous for dims >= 1
bool ggml_is_contiguous_2(const struct ggml_tensor * tensor); // contiguous for dims >= 2

All ggml operations are written to respect nb strides and do not assume contiguity. If you need a contiguous copy for an external library, call ggml_cont:

struct ggml_tensor * t_cont = ggml_cont(ctx, t_strided);

Utility functions

int64_t ggml_nelements(const struct ggml_tensor * tensor);  // total element count
int64_t ggml_nrows    (const struct ggml_tensor * tensor);  // ne[1] * ne[2] * ne[3]
size_t  ggml_nbytes   (const struct ggml_tensor * tensor);  // total byte size

size_t  ggml_type_size(enum ggml_type type);   // bytes per block
int64_t ggml_blck_size(enum ggml_type type);   // elements per block
size_t  ggml_row_size (enum ggml_type type, int64_t ne);

const char * ggml_type_name(enum ggml_type type);
bool         ggml_is_quantized(enum ggml_type type);

Get Started

Core Concepts

Backends

Training

File Formats

Examples

The `ggml_tensor` struct

Data types

Creating tensors

Reading and writing values

Tensor metadata

Contiguous vs strided tensors

Utility functions

Build docs developers (and LLMs) love

Get Started

Core Concepts

Backends

Training

File Formats

Examples

​The ggml_tensor struct

​Data types

​Creating tensors

​Reading and writing values

​Tensor metadata

​Contiguous vs strided tensors

​Utility functions

Build docs developers (and LLMs) love

The `ggml_tensor` struct

Data types

Creating tensors

Reading and writing values

Tensor metadata

Contiguous vs strided tensors

Utility functions