ggml supports automatic differentiation through reverse-mode (backpropagation). Every operation that has a differentiable implementation provides both a forward function and a backward function. The backward function computes the adjoint of each input tensor given the adjoint of the output.
How it works
- Forward pass — define the function and compute its value.
- Backward pass — ggml automatically builds gradient nodes that propagate the loss gradient back through every operation in the graph.
- Read gradients — retrieve the gradient tensor for any parameter after computation.
Marking trainable parameters
Call ggml_set_param to mark a tensor as a trainable parameter. This sets GGML_TENSOR_FLAG_PARAM on the tensor and tells the autodiff engine to compute gradients for it.
struct ggml_tensor * x = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
ggml_set_param(x); // x is an input variable / trainable parameter
ggml_set_param does not allocate a gradient tensor immediately. Gradient storage is allocated when you call ggml_build_backward_expand.
Full example: f(x) = a·x² + b
This example is taken directly from the ggml.h header comments.
Define the function
struct ggml_init_params params = {
.mem_size = 16*1024*1024,
.mem_buffer = NULL,
};
struct ggml_context * ctx = ggml_init(params);
struct ggml_tensor * x = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
ggml_set_param(x); // x is the variable we differentiate with respect to
struct ggml_tensor * a = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
struct ggml_tensor * b = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
struct ggml_tensor * x2 = ggml_mul(ctx, x, x);
struct ggml_tensor * f = ggml_add(ctx, ggml_mul(ctx, a, x2), b);
Build forward and backward graphs
// Build the forward graph (grads=true is required for backward pass)
struct ggml_cgraph * gf = ggml_new_graph_custom(ctx, GGML_DEFAULT_GRAPH_SIZE, /*grads=*/true);
ggml_build_forward_expand(gf, f);
// Allocate gradient accumulator tensors and attach backward nodes.
// grad_accs is an array of ggml_tensor* with one entry per graph output node.
// The ggml-opt.h high-level API manages grad_accs automatically.
// For manual use, pass an array of NULL pointers (ggml allocates gradient tensors):
struct ggml_tensor * grad_accs[1] = { NULL };
ggml_build_backward_expand(ctx, gf, grad_accs);
Set values and compute
ggml_set_f32(x, 2.0f);
ggml_set_f32(a, 3.0f);
ggml_set_f32(b, 4.0f);
// Reset gradient accumulators to zero before each backward pass
ggml_graph_reset(gf);
// Run both forward and backward passes
ggml_graph_compute_with_ctx(ctx, gf, /*n_threads=*/1);
Read gradients
// f = a*x^2 + b => df/dx = 2*a*x = 2*3*2 = 12
struct ggml_tensor * grad_x = ggml_graph_get_grad(gf, x);
printf("df/dx = %f\n", ggml_get_f32_1d(grad_x, 0)); // 12.0
ggml_build_backward_expand
void ggml_build_backward_expand(
struct ggml_context * ctx,
struct ggml_cgraph * cgraph,
struct ggml_tensor ** grad_accs);
ctx — the context used to allocate gradient tensors
cgraph — a forward graph previously built with ggml_build_forward_expand; must have been created with grads = true
grad_accs — array of ggml_tensor * with one entry per output node in the forward graph; pass NULL entries to have ggml allocate gradient accumulator tensors automatically
After this call, the graph contains both forward and backward nodes. Calling ggml_graph_compute will execute them in the correct order.
Accessing gradients
// Gradient tensor for a node in the forward graph
struct ggml_tensor * ggml_graph_get_grad(
const struct ggml_cgraph * cgraph,
const struct ggml_tensor * node);
// Gradient accumulator (for gradient accumulation across multiple batches)
struct ggml_tensor * ggml_graph_get_grad_acc(
const struct ggml_cgraph * cgraph,
const struct ggml_tensor * node);
ggml_graph_get_grad returns NULL for tensors that are not reachable by any parameter in the graph (i.e., tensors where no gradient flows).
Gradient accumulation
ggml supports accumulating gradients across multiple forward/backward passes before applying an optimizer step — useful for simulating larger batch sizes.
for (int step = 0; step < accumulation_steps; step++) {
// Load next mini-batch into input tensors
load_batch(inputs, step);
// Do NOT reset gradients between accumulation steps;
// call ggml_graph_reset only before the first step in an accumulation window.
if (step == 0) {
ggml_graph_reset(gf);
}
ggml_graph_compute_with_ctx(ctx, gf, n_threads);
}
// Now apply optimizer using the accumulated gradients
struct ggml_tensor * grad = ggml_graph_get_grad_acc(gf, param);
ggml_graph_reset zeroes gradient accumulators and sets the loss gradient seed to 1.0. Call it once at the start of each accumulation window.
Loss tensors
Mark the final scalar output as a loss to signal the optimizer:
struct ggml_tensor * loss = ggml_cross_entropy_loss(ctx, logits, labels);
ggml_set_loss(loss); // sets GGML_TENSOR_FLAG_LOSS
Multiple loss tensors sum together. The backward pass seeds these tensors with gradient 1.0 automatically.
High-level training API
For training workloads, ggml-opt.h provides a higher-level interface that manages forward/backward graph construction, gradient accumulation, and optimizer steps:
#include "ggml-opt.h"
ggml_opt_fit(
backend_sched,
ctx_compute,
inputs,
outputs,
dataset,
GGML_OPT_LOSS_TYPE_CROSS_ENTROPY,
GGML_OPT_OPTIMIZER_TYPE_ADAMW,
get_opt_pars,
/*nepoch=*/10,
/*nbatch_logical=*/256,
/*val_split=*/0.1f,
/*silent=*/false);
See ggml-opt.h for the full API.