Skip to main content
ggml provides a high-level training API through ggml-opt.h. It handles dataset batching, gradient accumulation, forward and backward passes, and optimizer steps — so you can focus on defining your model graph.

Workflow

1

Select a loss type

Choose the loss function that matches your problem. The built-in options cover most supervised learning tasks:
enum ggml_opt_loss_type {
    GGML_OPT_LOSS_TYPE_MEAN,              // reduce outputs to mean (custom loss via graph)
    GGML_OPT_LOSS_TYPE_SUM,               // reduce outputs to sum (custom loss via graph)
    GGML_OPT_LOSS_TYPE_CROSS_ENTROPY,     // classification
    GGML_OPT_LOSS_TYPE_MEAN_SQUARED_ERROR // regression
};
Use MEAN or SUM when your graph already computes a meaningful scalar loss and you only need the optimizer to minimize it.
2

Create a dataset

Allocate a dataset and populate its data and labels tensors with your training samples. See Datasets for full details.
ggml_opt_dataset_t dataset = ggml_opt_dataset_init(
    GGML_TYPE_F32, // type for data tensor
    GGML_TYPE_F32, // type for labels tensor
    ne_datapoint,  // elements per datapoint
    ne_label,      // elements per label
    ndata,         // total number of datapoints
    ndata_shard    // shuffle granularity
);
3

Build a GGML graph

Define your model as a GGML computation graph with no_alloc = true. Use two separate contexts:
  • Parameters context — holds model weights and the inputs tensor. Allocate this statically in your code; its data remains valid throughout training.
  • Compute context — holds all intermediate tensors. The optimizer reallocates this context automatically; do not read its tensor data directly.
// Parameters context — allocated once
struct ggml_init_params params_ctx = {
    .mem_size   = model_mem_size,
    .mem_buffer = NULL,
    .no_alloc   = false,
};
struct ggml_context * ctx_model = ggml_init(params_ctx);

struct ggml_tensor * inputs  = ggml_new_tensor_2d(ctx_model, GGML_TYPE_F32, ne_input,  ndata_batch);
struct ggml_tensor * weights = ggml_new_tensor_2d(ctx_model, GGML_TYPE_F32, ne_hidden, ne_input);

// Compute context — reused each step
struct ggml_init_params compute_ctx = {
    .mem_size   = compute_mem_size,
    .mem_buffer = NULL,
    .no_alloc   = true, // no_alloc must be true
};
struct ggml_context * ctx_compute = ggml_init(compute_ctx);

struct ggml_tensor * hidden  = ggml_mul_mat(ctx_compute, weights, inputs);
struct ggml_tensor * outputs = ggml_relu(ctx_compute, hidden);
The second dimension of inputs and outputs is interpreted as the batch size (number of datapoints). Make sure it matches ndata_batch in your dataset batching.
4

Fit the model

Call ggml_opt_fit to run the full training loop. It handles shuffling, batching, gradient accumulation, validation splits, and epoch reporting.
ggml_opt_fit(
    backend_sched,                       // backend scheduler
    ctx_compute,                         // compute context
    inputs,                              // input tensor
    outputs,                             // output tensor
    dataset,                             // dataset
    GGML_OPT_LOSS_TYPE_CROSS_ENTROPY,    // loss function
    GGML_OPT_OPTIMIZER_TYPE_ADAMW,       // optimizer
    ggml_opt_get_default_optimizer_params, // optimizer params callback
    /*nepoch=*/30,                        // number of epochs
    /*nbatch_logical=*/500,               // datapoints per optimizer step
    /*val_split=*/0.05f,                  // 5% held out for validation
    /*silent=*/false                     // print progress to stderr
);
For more control over the loop — custom callbacks, per-batch metrics, or mid-epoch checkpointing — use ggml_opt_epoch instead.

ggml_opt_fit parameters

void ggml_opt_fit(
    ggml_backend_sched_t          backend_sched,
    struct ggml_context         * ctx_compute,
    struct ggml_tensor          * inputs,
    struct ggml_tensor          * outputs,
    ggml_opt_dataset_t            dataset,
    enum ggml_opt_loss_type       loss_type,
    enum ggml_opt_optimizer_type  optimizer,
    ggml_opt_get_optimizer_params get_opt_pars,
    int64_t                       nepoch,
    int64_t                       nbatch_logical,
    float                         val_split,
    bool                          silent);
ParameterDescription
backend_schedBackend scheduler that controls which device(s) execute the compute graphs.
ctx_computeGGML context holding the temporary (non-parameter) tensors of your graph. Must have been created with no_alloc = true.
inputsInput tensor. Shape must be [ne_datapoint, ndata_batch].
outputsOutput tensor. When labels are used, shape must be [ne_label, ndata_batch].
datasetDataset created with ggml_opt_dataset_init.
loss_typeLoss function to minimize.
optimizerGGML_OPT_OPTIMIZER_TYPE_ADAMW or GGML_OPT_OPTIMIZER_TYPE_SGD.
get_opt_parsCallback invoked before each backward pass to supply optimizer hyperparameters. The userdata pointer passed to this callback is a pointer to the current epoch number (int64_t), which enables learning rate schedules.
nepochNumber of full passes over the training portion of the dataset.
nbatch_logicalNumber of datapoints between optimizer steps. Must be a multiple of the physical batch size (the second dimension of inputs). Values larger than the physical batch trigger gradient accumulation.
val_splitFraction of the dataset reserved for validation. Must be in [0.0, 1.0). Pass 0.0f to skip validation.
silentWhen true, suppresses all progress output to stderr.

Static vs dynamic graph allocation

The optimizer context supports two graph allocation modes:
Set ctx_compute, inputs, and outputs on the ggml_opt_params struct before calling ggml_opt_init. The optimizer allocates the forward, gradient, and optimizer graphs once at initialization and reuses them for every evaluation.This is the mode used by ggml_opt_fit. Prefer static allocation when the graph topology is fixed across all batches.
struct ggml_opt_params params = ggml_opt_default_params(sched, loss_type);
params.ctx_compute = ctx_compute;
params.inputs      = inputs;
params.outputs     = outputs;

ggml_opt_context_t opt_ctx = ggml_opt_init(params);
// graphs are allocated once here — no per-step reallocation

Build types

The build_type field on ggml_opt_params controls which graphs the optimizer constructs:
enum ggml_opt_build_type {
    GGML_OPT_BUILD_TYPE_FORWARD = 10, // forward pass only (inference)
    GGML_OPT_BUILD_TYPE_GRAD    = 20, // forward + backward (gradient computation)
    GGML_OPT_BUILD_TYPE_OPT     = 30, // forward + backward + optimizer step (full training)
};
Build typeUse case
FORWARDEvaluation or inference — no gradients computed.
GRADCompute gradients without applying an optimizer step. Useful for inspecting gradients or implementing custom update rules.
OPTFull training: forward pass, backward pass, and optimizer parameter update. This is the default for ggml_opt_fit.

Optimizers

Configure AdamW and SGD, set learning rate schedules, and manage the optimizer context.

Datasets

Initialize datasets, populate tensors, shuffle data, and write custom epoch callbacks.

Build docs developers (and LLMs) love