Optimizer API

The optimizer API provides a complete training loop abstraction on top of GGML’s computation graph primitives. It manages graph construction, gradient accumulation, loss computation, and parameter updates in a unified interface.

This module is maintained by Johannes Gäßler. The high-level functions (ggml_opt_epoch, ggml_opt_fit) are designed to be copied and adapted directly into user code.

Enums

ggml_opt_loss_type

Controls how the scalar loss value is derived from the model outputs.

enum ggml_opt_loss_type {
    GGML_OPT_LOSS_TYPE_MEAN,
    GGML_OPT_LOSS_TYPE_SUM,
    GGML_OPT_LOSS_TYPE_CROSS_ENTROPY,
    GGML_OPT_LOSS_TYPE_MEAN_SQUARED_ERROR,
};

Value	Description
`GGML_OPT_LOSS_TYPE_MEAN`	Mean of all output values. Useful as a custom loss when you compute the loss yourself within the graph.
`GGML_OPT_LOSS_TYPE_SUM`	Sum of all output values. Like `MEAN` but without dividing by the number of elements.
`GGML_OPT_LOSS_TYPE_CROSS_ENTROPY`	Categorical cross-entropy between outputs (logits) and one-hot labels. The standard choice for classification.
`GGML_OPT_LOSS_TYPE_MEAN_SQUARED_ERROR`	Mean squared error between outputs and labels. The standard choice for regression.

ggml_opt_build_type

Controls how much of the computation graph is built.

enum ggml_opt_build_type {
    GGML_OPT_BUILD_TYPE_FORWARD = 10, // forward pass only
    GGML_OPT_BUILD_TYPE_GRAD    = 20, // forward + backward (gradients only)
    GGML_OPT_BUILD_TYPE_OPT     = 30, // forward + backward + optimizer step
};

ggml_opt_optimizer_type

enum ggml_opt_optimizer_type {
    GGML_OPT_OPTIMIZER_TYPE_ADAMW,
    GGML_OPT_OPTIMIZER_TYPE_SGD,
};

Optimizer parameters

ggml_opt_optimizer_params

Holds hyperparameters for both supported optimizers. Only the fields for the active optimizer type are used.

struct ggml_opt_optimizer_params {
    struct {
        float alpha; // learning rate
        float beta1; // first moment decay (default 0.9)
        float beta2; // second moment decay (default 0.999)
        float eps;   // epsilon for numerical stability (default 1e-8)
        float wd;    // weight decay, 0.0 to disable
    } adamw;
    struct {
        float alpha; // learning rate
        float wd;    // weight decay
    } sgd;
};

AdamW fields:

adamw.alpha

float

Learning rate. Controls the step size at each parameter update. Typical values: 1e-4 to 1e-2.

adamw.beta1

float

Exponential decay rate for the first moment estimate. Default: 0.9.

adamw.beta2

float

Exponential decay rate for the second moment estimate. Default: 0.999.

adamw.eps

float

Small constant added to the denominator for numerical stability. Default: 1e-8.

adamw.wd

float

Weight decay coefficient. Set to 0.0f to disable. Decoupled weight decay as in the AdamW paper.

SGD fields:

sgd.alpha

float

Learning rate.

sgd.wd

float

Weight decay.

ggml_opt_get_optimizer_params callback

typedef struct ggml_opt_optimizer_params (*ggml_opt_get_optimizer_params)(void * userdata);

A function pointer called before each backward pass to obtain the current optimizer hyperparameters. Use this to implement learning rate schedules. Two built-in implementations are provided:

ggml_opt_get_default_optimizer_params — returns hard-coded default values; ignores userdata.
ggml_opt_get_constant_optimizer_params — casts userdata to struct ggml_opt_optimizer_params * and returns it directly.

ggml_opt_params

Configuration struct for creating an optimization context.

struct ggml_opt_params {
    ggml_backend_sched_t   backend_sched; // scheduler defining which backends run the graphs
    struct ggml_context  * ctx_compute;   // context for temporary compute tensors (optional)
    struct ggml_tensor   * inputs;        // input tensor (optional, for static graphs)
    struct ggml_tensor   * outputs;       // output tensor (optional, for static graphs)
    enum ggml_opt_loss_type  loss_type;
    enum ggml_opt_build_type build_type;
    int32_t                  opt_period;  // gradient accumulation steps per optimizer step
    ggml_opt_get_optimizer_params get_opt_pars;
    void *                        get_opt_pars_ud; // userdata for get_opt_pars
    enum ggml_opt_optimizer_type  optimizer;
};

backend_sched

ggml_backend_sched_t

required

Backend scheduler used to build and execute the forward and backward graphs.

ctx_compute

struct ggml_context *

When set alongside inputs and outputs, graphs are allocated statically once and reused. When NULL, a new graph is built for each evaluation.

inputs

struct ggml_tensor *

Input tensor. The second dimension is interpreted as the batch size (number of datapoints).

outputs

struct ggml_tensor *

Output tensor. Must have shape [ne_label, ndata_batch] when labels are used.

loss_type

enum ggml_opt_loss_type

required

Which loss function to minimize.

build_type

enum ggml_opt_build_type

required

Whether to build a forward-only, gradient, or full optimization graph.

opt_period

int32_t

Number of gradient accumulation steps before each optimizer parameter update. Set to 1 for standard SGD/AdamW without accumulation.

get_opt_pars

ggml_opt_get_optimizer_params

Callback invoked before each backward pass to retrieve the current optimizer hyperparameters.

optimizer

enum ggml_opt_optimizer_type

required

Which optimizer to use (ADAMW or SGD).

Get a params struct with sensible defaults using:

struct ggml_opt_params ggml_opt_default_params(
    ggml_backend_sched_t    backend_sched,
    enum ggml_opt_loss_type loss_type);

Context lifecycle

ggml_opt_init

Creates and initializes an optimization context.

ggml_opt_context_t ggml_opt_init(struct ggml_opt_params params);

params

struct ggml_opt_params

required

Configuration for the optimizer. Use ggml_opt_default_params to start from sensible defaults.

Returns a new context. Free with ggml_opt_free.

ggml_opt_free

Destroys an optimization context and releases all associated memory.

void ggml_opt_free(ggml_opt_context_t opt_ctx);

opt_ctx

ggml_opt_context_t

required

The context to free.

ggml_opt_reset

Zeroes gradients, resets the loss accumulator, and optionally resets optimizer state (e.g. Adam moment estimates).

void ggml_opt_reset(ggml_opt_context_t opt_ctx, bool optimizer);

opt_ctx

ggml_opt_context_t

required

The context to reset.

optimizer

bool

required

When true, also resets the optimizer’s internal state (first/second moment estimates for AdamW). Pass false to only zero gradients and the loss.

Tensor accessors

These functions return pointers to the internal tensors managed by the optimization context.

When not using static graphs, these pointers become invalid after the next call to ggml_opt_alloc.

ggml_opt_inputs

Returns the input tensor of the forward graph.

struct ggml_tensor * ggml_opt_inputs(ggml_opt_context_t opt_ctx);

ggml_opt_outputs

Returns the output tensor of the forward graph.

struct ggml_tensor * ggml_opt_outputs(ggml_opt_context_t opt_ctx);

ggml_opt_labels

Returns the labels tensor used to compute the loss.

struct ggml_tensor * ggml_opt_labels(ggml_opt_context_t opt_ctx);

ggml_opt_loss

Returns the scalar tensor that holds the current loss value after ggml_opt_eval.

struct ggml_tensor * ggml_opt_loss(ggml_opt_context_t opt_ctx);

Optimization result

ggml_opt_result_t accumulates statistics (loss, accuracy, number of datapoints) across multiple evaluation steps.

ggml_opt_result_init

Creates a new, empty result object.

ggml_opt_result_t ggml_opt_result_init(void);

Free with ggml_opt_result_free.

ggml_opt_result_ndata

Writes the total number of datapoints processed into *ndata.

void ggml_opt_result_ndata(
    ggml_opt_result_t result,
    int64_t         * ndata);

result

ggml_opt_result_t

required

The result to query.

ndata

int64_t *

required

Output: number of datapoints.

ggml_opt_result_loss

Writes the accumulated loss and its standard uncertainty into the output pointers.

void ggml_opt_result_loss(
    ggml_opt_result_t result,
    double          * loss,
    double          * unc);

result

ggml_opt_result_t

required

The result to query.

loss

double *

required

Output: mean loss over all datapoints.

unc

double *

Output: standard uncertainty of the loss estimate. Pass NULL to ignore.

ggml_opt_result_accuracy

Writes classification accuracy and its standard uncertainty into the output pointers.

void ggml_opt_result_accuracy(
    ggml_opt_result_t result,
    double          * accuracy,
    double          * unc);

result

ggml_opt_result_t

required

The result to query.

accuracy

double *

required

Output: fraction of correctly classified datapoints in [0, 1].

unc

double *

Output: standard uncertainty. Pass NULL to ignore.

Low-level computation

These functions give you fine-grained control over graph allocation and evaluation. Use them when ggml_opt_epoch or ggml_opt_fit do not offer enough flexibility.

ggml_opt_prepare_alloc

Sets the graph, inputs, and outputs for the next call to ggml_opt_alloc. Required when not using static graphs.

void ggml_opt_prepare_alloc(
    ggml_opt_context_t    opt_ctx,
    struct ggml_context * ctx_compute,
    struct ggml_cgraph  * gf,
    struct ggml_tensor  * inputs,
    struct ggml_tensor  * outputs);

opt_ctx

ggml_opt_context_t

required

The optimization context.

ctx_compute

struct ggml_context *

required

The context containing temporarily allocated compute tensors.

struct ggml_cgraph *

required

The forward computation graph.

inputs

struct ggml_tensor *

required

Input tensor in gf.

outputs

struct ggml_tensor *

required

Output tensor in gf.

ggml_opt_alloc

Allocates the next graph for evaluation. Must be called exactly once before each call to ggml_opt_eval.

void ggml_opt_alloc(ggml_opt_context_t opt_ctx, bool backward);

opt_ctx

ggml_opt_context_t

required

The optimization context.

backward

bool

required

When true, the backward graph (for gradient computation and parameter update) is allocated in addition to the forward graph.

ggml_opt_eval

Executes the allocated graph. Performs a forward pass, increments the result, and (if the backward graph was allocated) performs the backward pass.

void ggml_opt_eval(ggml_opt_context_t opt_ctx, ggml_opt_result_t result);

opt_ctx

ggml_opt_context_t

required

The optimization context.

result

ggml_opt_result_t

Result object to increment with the statistics from this evaluation. Pass NULL to discard statistics.

High-level training API

ggml_opt_epoch_callback

A callback invoked after each batch evaluation during ggml_opt_epoch.

typedef void (*ggml_opt_epoch_callback)(
    bool               train,       // true = training batch, false = validation batch
    ggml_opt_context_t opt_ctx,
    ggml_opt_dataset_t dataset,
    ggml_opt_result_t  result,      // result for the current dataset subsection
    int64_t            ibatch,      // batches evaluated so far
    int64_t            ibatch_max,  // total batches in this subsection
    int64_t            t_start_us); // start time in microseconds

A built-in implementation ggml_opt_epoch_callback_progress_bar prints a progress bar to stderr.

ggml_opt_epoch

Runs one epoch: trains on the front portion of the dataset and evaluates on the back portion.

void ggml_opt_epoch(
    ggml_opt_context_t      opt_ctx,
    ggml_opt_dataset_t      dataset,
    ggml_opt_result_t       result_train,
    ggml_opt_result_t       result_eval,
    int64_t                 idata_split,
    ggml_opt_epoch_callback callback_train,
    ggml_opt_epoch_callback callback_eval);

opt_ctx

ggml_opt_context_t

required

The optimization context.

dataset

ggml_opt_dataset_t

required

The dataset to iterate over.

result_train

ggml_opt_result_t

Result object incremented during the training portion. Pass NULL to discard.

result_eval

ggml_opt_result_t

Result object incremented during the validation portion. Pass NULL to discard.

idata_split

int64_t

required

Datapoint index that separates training (indices [0, idata_split)) from validation (indices [idata_split, ndata)).

callback_train

ggml_opt_epoch_callback

Called after each training batch. Pass NULL for no callback.

callback_eval

ggml_opt_epoch_callback

Called after each validation batch. Pass NULL for no callback.

ggml_opt_fit

Fits the model to a dataset over multiple epochs. This is the highest-level training entry point.

void ggml_opt_fit(
    ggml_backend_sched_t          backend_sched,
    struct ggml_context         * ctx_compute,
    struct ggml_tensor          * inputs,
    struct ggml_tensor          * outputs,
    ggml_opt_dataset_t            dataset,
    enum ggml_opt_loss_type       loss_type,
    enum ggml_opt_optimizer_type  optimizer,
    ggml_opt_get_optimizer_params get_opt_pars,
    int64_t                       nepoch,
    int64_t                       nbatch_logical,
    float                         val_split,
    bool                          silent);

backend_sched

ggml_backend_sched_t

required

Backend scheduler used to build and run the compute graphs.

ctx_compute

struct ggml_context *

required

Context containing temporarily allocated tensors for the forward pass.

inputs

struct ggml_tensor *

required

Input tensor with shape [ne_datapoint, ndata_batch].

outputs

struct ggml_tensor *

required

Output tensor. Must have shape [ne_label, ndata_batch] when labels are used.

dataset

ggml_opt_dataset_t

required

Dataset containing training data and optionally labels.

loss_type

enum ggml_opt_loss_type

required

The loss function to minimize.

optimizer

enum ggml_opt_optimizer_type

required

Which optimizer to use.

get_opt_pars

ggml_opt_get_optimizer_params

required

Callback to retrieve optimizer hyperparameters. The userdata passed is a pointer to the current epoch number (int64_t *).

nepoch

int64_t

required

Number of times to iterate over the full dataset.

nbatch_logical

int64_t

required

Number of datapoints per logical optimizer step. Must be a multiple of the physical batch size (second dimension of inputs/outputs).

val_split

float

required

Fraction of the dataset reserved for validation. Must be in [0.0, 1.0). Pass 0.0 to skip validation.

silent

bool

required

When true, suppresses all progress output to stderr.

Recommended usage pattern

// 1. Choose a loss for your problem
//    - Classification: GGML_OPT_LOSS_TYPE_CROSS_ENTROPY
//    - Regression:     GGML_OPT_LOSS_TYPE_MEAN_SQUARED_ERROR

// 2. Build model graph (no_alloc = true; two contexts: one for weights, one for compute)
struct ggml_tensor * inputs  = ggml_new_tensor_2d(ctx_params, GGML_TYPE_F32, ne_input,  ndata_batch);
struct ggml_tensor * outputs = ggml_new_tensor_2d(ctx_compute, GGML_TYPE_F32, ne_output, ndata_batch);
// ... build the graph that connects inputs -> outputs ...

// 3. Create dataset
ggml_opt_dataset_t dataset = ggml_opt_dataset_init(...);
// ... populate dataset->data and dataset->labels ...

// 4. Fit
ggml_opt_fit(
    backend_sched, ctx_compute, inputs, outputs, dataset,
    GGML_OPT_LOSS_TYPE_CROSS_ENTROPY,
    GGML_OPT_OPTIMIZER_TYPE_ADAMW,
    ggml_opt_get_default_optimizer_params,
    /*nepoch=*/        10,
    /*nbatch_logical=*/256,
    /*val_split=*/     0.1f,
    /*silent=*/        false);

Core API

Backend API

Optimization API

GGUF API

Optimizer API

Enums

ggml_opt_loss_type

ggml_opt_build_type

ggml_opt_optimizer_type

Optimizer parameters

ggml_opt_optimizer_params

ggml_opt_get_optimizer_params callback

ggml_opt_params

Context lifecycle

Tensor accessors

Optimization result

Low-level computation

High-level training API

ggml_opt_epoch_callback

Recommended usage pattern

Build docs developers (and LLMs) love

Core API

Backend API

Optimization API

GGUF API

​Enums

​ggml_opt_loss_type

​ggml_opt_build_type

​ggml_opt_optimizer_type

​Optimizer parameters

​ggml_opt_optimizer_params

​ggml_opt_get_optimizer_params callback

​ggml_opt_params

​Context lifecycle

​Tensor accessors

​Optimization result

​Low-level computation

​High-level training API

​ggml_opt_epoch_callback

​Recommended usage pattern

Build docs developers (and LLMs) love

Enums

ggml_opt_loss_type

ggml_opt_build_type

ggml_opt_optimizer_type

Optimizer parameters

ggml_opt_optimizer_params

ggml_opt_get_optimizer_params callback

ggml_opt_params

Context lifecycle

Tensor accessors

Optimization result

Low-level computation

High-level training API

ggml_opt_epoch_callback

Recommended usage pattern