This module is maintained by Johannes Gäßler. The high-level functions (
ggml_opt_epoch, ggml_opt_fit) are designed to be copied and adapted directly into user code.Enums
ggml_opt_loss_type
Controls how the scalar loss value is derived from the model outputs.| Value | Description |
|---|---|
GGML_OPT_LOSS_TYPE_MEAN | Mean of all output values. Useful as a custom loss when you compute the loss yourself within the graph. |
GGML_OPT_LOSS_TYPE_SUM | Sum of all output values. Like MEAN but without dividing by the number of elements. |
GGML_OPT_LOSS_TYPE_CROSS_ENTROPY | Categorical cross-entropy between outputs (logits) and one-hot labels. The standard choice for classification. |
GGML_OPT_LOSS_TYPE_MEAN_SQUARED_ERROR | Mean squared error between outputs and labels. The standard choice for regression. |
ggml_opt_build_type
Controls how much of the computation graph is built.ggml_opt_optimizer_type
Optimizer parameters
ggml_opt_optimizer_params
Holds hyperparameters for both supported optimizers. Only the fields for the active optimizer type are used.Learning rate. Controls the step size at each parameter update. Typical values:
1e-4 to 1e-2.Exponential decay rate for the first moment estimate. Default:
0.9.Exponential decay rate for the second moment estimate. Default:
0.999.Small constant added to the denominator for numerical stability. Default:
1e-8.Weight decay coefficient. Set to
0.0f to disable. Decoupled weight decay as in the AdamW paper.Learning rate.
Weight decay.
ggml_opt_get_optimizer_params callback
ggml_opt_get_default_optimizer_params— returns hard-coded default values; ignoresuserdata.ggml_opt_get_constant_optimizer_params— castsuserdatatostruct ggml_opt_optimizer_params *and returns it directly.
ggml_opt_params
Configuration struct for creating an optimization context.Backend scheduler used to build and execute the forward and backward graphs.
When set alongside
inputs and outputs, graphs are allocated statically once and reused. When NULL, a new graph is built for each evaluation.Input tensor. The second dimension is interpreted as the batch size (number of datapoints).
Output tensor. Must have shape
[ne_label, ndata_batch] when labels are used.Which loss function to minimize.
Whether to build a forward-only, gradient, or full optimization graph.
Number of gradient accumulation steps before each optimizer parameter update. Set to
1 for standard SGD/AdamW without accumulation.Callback invoked before each backward pass to retrieve the current optimizer hyperparameters.
Which optimizer to use (
ADAMW or SGD).Context lifecycle
ggml_opt_init
ggml_opt_init
Creates and initializes an optimization context.Returns a new context. Free with
Configuration for the optimizer. Use
ggml_opt_default_params to start from sensible defaults.ggml_opt_free.ggml_opt_free
ggml_opt_free
Destroys an optimization context and releases all associated memory.
The context to free.
ggml_opt_reset
ggml_opt_reset
Zeroes gradients, resets the loss accumulator, and optionally resets optimizer state (e.g. Adam moment estimates).
The context to reset.
When
true, also resets the optimizer’s internal state (first/second moment estimates for AdamW). Pass false to only zero gradients and the loss.Tensor accessors
These functions return pointers to the internal tensors managed by the optimization context.ggml_opt_inputs
ggml_opt_inputs
Returns the input tensor of the forward graph.
ggml_opt_outputs
ggml_opt_outputs
Returns the output tensor of the forward graph.
ggml_opt_labels
ggml_opt_labels
Returns the labels tensor used to compute the loss.
ggml_opt_loss
ggml_opt_loss
Returns the scalar tensor that holds the current loss value after
ggml_opt_eval.Optimization result
ggml_opt_result_t accumulates statistics (loss, accuracy, number of datapoints) across multiple evaluation steps.
ggml_opt_result_init
ggml_opt_result_init
Creates a new, empty result object.Free with
ggml_opt_result_free.ggml_opt_result_ndata
ggml_opt_result_ndata
ggml_opt_result_loss
ggml_opt_result_loss
ggml_opt_result_accuracy
ggml_opt_result_accuracy
Low-level computation
These functions give you fine-grained control over graph allocation and evaluation. Use them whenggml_opt_epoch or ggml_opt_fit do not offer enough flexibility.
ggml_opt_prepare_alloc
ggml_opt_prepare_alloc
Sets the graph, inputs, and outputs for the next call to
ggml_opt_alloc. Required when not using static graphs.The optimization context.
The context containing temporarily allocated compute tensors.
The forward computation graph.
Input tensor in
gf.Output tensor in
gf.ggml_opt_alloc
ggml_opt_alloc
Allocates the next graph for evaluation. Must be called exactly once before each call to
ggml_opt_eval.The optimization context.
When
true, the backward graph (for gradient computation and parameter update) is allocated in addition to the forward graph.ggml_opt_eval
ggml_opt_eval
Executes the allocated graph. Performs a forward pass, increments the result, and (if the backward graph was allocated) performs the backward pass.
The optimization context.
Result object to increment with the statistics from this evaluation. Pass
NULL to discard statistics.High-level training API
ggml_opt_epoch_callback
A callback invoked after each batch evaluation duringggml_opt_epoch.
ggml_opt_epoch_callback_progress_bar prints a progress bar to stderr.
ggml_opt_epoch
ggml_opt_epoch
Runs one epoch: trains on the front portion of the dataset and evaluates on the back portion.
The optimization context.
The dataset to iterate over.
Result object incremented during the training portion. Pass
NULL to discard.Result object incremented during the validation portion. Pass
NULL to discard.Datapoint index that separates training (indices
[0, idata_split)) from validation (indices [idata_split, ndata)).Called after each training batch. Pass
NULL for no callback.Called after each validation batch. Pass
NULL for no callback.ggml_opt_fit
ggml_opt_fit
Fits the model to a dataset over multiple epochs. This is the highest-level training entry point.
Backend scheduler used to build and run the compute graphs.
Context containing temporarily allocated tensors for the forward pass.
Input tensor with shape
[ne_datapoint, ndata_batch].Output tensor. Must have shape
[ne_label, ndata_batch] when labels are used.Dataset containing training data and optionally labels.
The loss function to minimize.
Which optimizer to use.
Callback to retrieve optimizer hyperparameters. The
userdata passed is a pointer to the current epoch number (int64_t *).Number of times to iterate over the full dataset.
Number of datapoints per logical optimizer step. Must be a multiple of the physical batch size (second dimension of
inputs/outputs).Fraction of the dataset reserved for validation. Must be in
[0.0, 1.0). Pass 0.0 to skip validation.When
true, suppresses all progress output to stderr.