ggml_cgraph) to separate the definition of computations from their execution. You define operations using the tensor API, which records nodes in the graph. You then call ggml_graph_compute to execute the graph.
Creating graphs
ggml_new_graph
GGML_DEFAULT_GRAPH_SIZE = 2048 nodes) and no gradient storage. The graph memory is drawn from ctx’s pool.
ggml_new_graph_custom
Maximum number of nodes (tensors) the graph can hold.
When
true, the graph allocates gradient accumulator storage. Required before calling ggml_build_backward_expand.ggml_graph_overhead / ggml_graph_overhead_custom
mem_size budget before calling ggml_new_graph or ggml_new_graph_custom.
Building graphs
ggml_build_forward_expand
tensor and all of its transitive dependencies (source tensors) to the graph as forward-pass nodes. Call this once for each output tensor you want to compute.
ggml_build_backward_expand
cgraph for automatic differentiation. Must be called after all ggml_build_forward_expand calls. The graph must have been created with grads = true.
Context used to allocate gradient tensors.
The forward graph to differentiate. Backward nodes are appended in place.
Array of gradient accumulator tensors, one per node in the graph. Typically obtained via
ggml_graph_get_grad_acc.Computing graphs
Compute functions are declared inggml-cpu.h and operate on the CPU backend.
ggml_graph_compute
ggml_graph_plan. Returns GGML_STATUS_SUCCESS on success.
Typical usage:
ggml_graph_plan
ggml_graph_compute. When cplan.work_size > 0, the caller must allocate cplan.work_data before passing it to ggml_graph_compute.
Number of threads to use. Pass
GGML_DEFAULT_N_THREADS (4) for the default.Pre-created thread pool. Pass
NULL to create a temporary pool internally.ggml_graph_compute_with_ctx
ctx instead of requiring the caller to manage it separately. The context must have enough remaining space for the work data.
The trade-off of
ggml_graph_compute_with_ctx over ggml_graph_compute is that you must reserve extra memory in the context for the work buffer. Use ggml_graph_compute directly when memory is tight.Graph inspection
ggml_graph_n_nodes
ggml_graph_nodes
ggml_graph_n_nodes() entries.
ggml_graph_node
i-th node. Negative i counts from the end (nodes[n_nodes + i]).
ggml_graph_get_tensor
NULL if not found.
ggml_graph_get_grad
node. Only valid after ggml_build_backward_expand has been called on a graph created with grads = true.
ggml_graph_get_grad_acc
node. Gradient accumulators accumulate gradients across multiple backward passes before being reset.
Graph utilities
ggml_graph_reset
1. Call before each backward pass in a training loop.
ggml_graph_clear
ggml_graph_print
stderr.
ggml_graph_dump_dot
.dot file representing the computation graph. Pass the backward graph as gb and the forward graph as cgraph. Open the output file with dot -Tsvg graph.dot -o graph.svg to visualize the graph.
Thread pool
The thread pool is declared inggml.h and implemented in the CPU backend (ggml-cpu.h).
ggml_threadpool_params
CPU affinity mask. All-zeros means use the OS default affinity settings.
Number of worker threads in the pool.
Scheduling priority:
GGML_SCHED_PRIO_LOW, GGML_SCHED_PRIO_NORMAL, GGML_SCHED_PRIO_MEDIUM, GGML_SCHED_PRIO_HIGH, or GGML_SCHED_PRIO_REALTIME.Polling aggressiveness.
0 means the threads sleep when idle; 100 means aggressive spinning. Higher values reduce latency at the cost of CPU utilization.When
true, worker threads start in a paused state and must be resumed with ggml_threadpool_resume before they process any work.ggml_threadpool_params_default
ggml_threadpool_params populated with sensible defaults for n_threads threads.
ggml_threadpool_params_init
ggml_threadpool_params struct in place with default values.
ggml_threadpool_new
NULL on failure.
