Automatic Differentiation

Deepbox implements reverse-mode automatic differentiation (backpropagation) through the GradTensor class. This enables gradient-based optimization for machine learning without manual derivative calculations.

Overview

Automatic differentiation (autograd) tracks operations on tensors to build a computation graph, then computes gradients via the chain rule during a backward pass.

import { parameter } from 'deepbox/ndarray';

// Create a parameter that requires gradients
const x = parameter([2, 3, 4]);

// Compute: f(x) = sum(x²)
const y = x.mul(x).sum();

// Compute gradients: df/dx = 2x
y.backward();

console.log(x.grad?.toString());
// Tensor([4, 6, 8], shape=[3], dtype=float32)

Autograd only tracks gradients for numeric dtypes (float32, float64, int32, uint8, bool). String tensors and int64 (BigInt) are not supported for differentiation.

GradTensor vs Tensor

Feature	`Tensor`	`GradTensor`
Purpose	Numerical arrays	Differentiable arrays
Gradient tracking	No	Yes (when `requiresGrad=true`)
Computation graph	Not recorded	Recorded for backward pass
Use case	Data, inference	Training, optimization
Memory overhead	Lower	Higher (stores graph)

Use Tensor for inference and data processing. Use GradTensor only when training models or computing derivatives.

Creating GradTensors

Using `parameter()`

The parameter() function creates a GradTensor with requiresGrad=true:

import { parameter } from 'deepbox/ndarray';

// Scalar parameter
const bias = parameter([0.5]);

// Weight matrix
const W = parameter([
  [0.1, 0.2],
  [0.3, 0.4],
]);

console.log(W.requiresGrad);  // true

From Existing Tensors

Convert a Tensor to GradTensor:

import { GradTensor, tensor } from 'deepbox/ndarray';

const t = tensor([1, 2, 3]);
const gt = GradTensor.fromTensor(t, { requiresGrad: true });

console.log(gt.requiresGrad);  // true

Creating Scalars

import { GradTensor } from 'deepbox/ndarray';

const learningRate = GradTensor.scalar(0.01, {
  requiresGrad: false,
  dtype: 'float32',
});

console.log(learningRate.shape);  // []

Gradient Computation

Basic Example

import { parameter } from 'deepbox/ndarray';

// f(x) = x²
const x = parameter([2, 3, 4]);
const y = x.mul(x);

// Sum to get scalar loss
const loss = y.sum();

// Compute gradients
loss.backward();

console.log('x:', x.tensor.toString());
// Tensor([2, 3, 4], shape=[3], dtype=float32)

console.log('f(x):', y.tensor.toString());
// Tensor([4, 9, 16], shape=[3], dtype=float32)

console.log('df/dx:', x.grad?.toString());
// Tensor([4, 6, 8], shape=[3], dtype=float32)
// df/dx = 2x → [4, 6, 8]

Multi-Variable Gradients

Compute gradients for multiple parameters:

import { parameter } from 'deepbox/ndarray';

const a = parameter([
  [1, 2],
  [3, 4],
]);

const w = parameter([[0.5], [0.5]]);

// y = sum(a @ w)
const z = a.matmul(w).sum();
z.backward();

console.log('dz/da:', a.grad?.toString());
// Gradient with respect to a

console.log('dz/dw:', w.grad?.toString());
// Gradient with respect to w

Chained Operations

Autograd automatically handles complex computation graphs:

import { parameter, GradTensor, tensor } from 'deepbox/ndarray';

const p = parameter([1, 2, 3, 4]);

// f(p) = sum(relu(2p - 3))
const two = GradTensor.fromTensor(tensor([2, 2, 2, 2]), {
  requiresGrad: false,
});
const three = GradTensor.fromTensor(tensor([3, 3, 3, 3]), {
  requiresGrad: false,
});

const scaled = p.mul(two);        // 2p
const shifted = scaled.sub(three); // 2p - 3
const activated = shifted.relu();  // relu(2p - 3)
const loss = activated.sum();      // sum(...)

loss.backward();

console.log('p:', p.tensor.toString());
// Tensor([1, 2, 3, 4], shape=[4], dtype=float32)

console.log('activated:', activated.tensor.toString());
// Tensor([0, 1, 3, 5], shape=[4], dtype=float32)
// relu sets negatives to zero

console.log('grad:', p.grad?.toString());
// Tensor([0, 2, 2, 2], shape=[4], dtype=float32)
// Gradient is 2 where input > 1.5, else 0

Gradient Management

Zero Gradients

Gradients accumulate by default. Clear them between training steps:

import { parameter } from 'deepbox/ndarray';

const v = parameter([1, 2, 3]);

// First backward pass
const loss1 = v.mul(v).sum();
loss1.backward();
console.log(v.grad?.toString());
// Tensor([2, 4, 6], shape=[3], dtype=float32)

// Zero gradients before next pass
v.zeroGrad();
console.log(v.grad?.toString());
// Tensor([0, 0, 0], shape=[3], dtype=float32)

// Second backward pass
const loss2 = v.sum();
loss2.backward();
console.log(v.grad?.toString());
// Tensor([1, 1, 1], shape=[3], dtype=float32)

Always call zeroGrad() before computing gradients for a new batch, or gradients will accumulate from previous iterations.

Setting Gradients Manually

import { parameter, tensor } from 'deepbox/ndarray';

const x = parameter([1, 2, 3]);

// Set custom gradient
const customGrad = tensor([0.1, 0.2, 0.3]);
x.setGrad(customGrad);

console.log(x.grad?.toString());
// Tensor([0.1, 0.2, 0.3], shape=[3], dtype=float32)

Detaching from Graph

Create a copy without gradient tracking:

import { parameter } from 'deepbox/ndarray';

const x = parameter([1, 2, 3]);
const y = x.mul(x);

// Detach: stop tracking gradients
const detached = y.detach();

console.log(x.requiresGrad);        // true
console.log(detached.requiresGrad); // false

Disabling Gradient Tracking

Using `noGrad()`

Temporarily disable gradient tracking for inference:

import { parameter, noGrad } from 'deepbox/ndarray';

const q = parameter([1, 2, 3]);

noGrad(() => {
  // Operations inside do NOT track gradients
  const result = q.mul(q);
  console.log(result.requiresGrad);  // false
});

// Back to normal gradient tracking
const result2 = q.mul(q);
console.log(result2.requiresGrad);  // true

noGrad() only accepts synchronous callbacks. Passing an async function throws an error because the gradient flag would restore before async work completes.

Module-Level State

Gradient tracking is controlled by a module-level singleton gradEnabled:

// Conceptual (internal state):
let gradEnabled = true;

// noGrad() temporarily sets it to false
noGrad(() => {
  // gradEnabled = false
  // ...
});
// gradEnabled = true again

Because JavaScript is single-threaded, this global flag is safe in synchronous code. Do not rely on it across async boundaries.

Supported Operations

GradTensor supports automatic differentiation for:

Arithmetic

const a = parameter([1, 2]);
const b = parameter([3, 4]);

a.add(b)      // Addition
a.sub(b)      // Subtraction
a.mul(b)      // Multiplication
a.div(b)      // Division
a.neg()       // Negation

Mathematical Functions

const x = parameter([1, 2, 3]);

x.pow(2)      // Power (x²)
x.sqrt()      // Square root
x.exp()       // Exponential
x.log()       // Natural logarithm
x.abs()       // Absolute value
x.clip(0, 5)  // Clipping

Activations

const x = parameter([1, 2, 3]);

x.relu()      // ReLU: max(0, x)
x.sigmoid()   // Sigmoid: 1 / (1 + e^-x)
x.tanh()      // Hyperbolic tangent

Reductions

const x = parameter([[1, 2], [3, 4]]);

x.sum()              // Sum all elements
x.sum(0)             // Sum along axis 0
x.sum(0, true)       // Sum with keepdims=true
x.mean()             // Mean
x.max()              // Maximum
x.min()              // Minimum

Linear Algebra

const a = parameter([[1, 2], [3, 4]]);
const b = parameter([[5, 6], [7, 8]]);

a.matmul(b)   // Matrix multiplication

Shape Operations

const x = parameter([1, 2, 3, 4, 5, 6]);

x.reshape([2, 3])     // Reshape
x.flatten()           // Flatten to 1D
x.transpose()         // Transpose
x.view([3, 2])        // Create view

Indexing

const x = parameter([1, 2, 3, 4, 5]);
const indices = parameter([0, 2, 4]);

x.slice({ start: 1, end: 4 })  // Slice
x.gather(indices, 0)           // Gather

Advanced: Custom Gradients

For custom operations, define backward functions:

import { GradTensor } from 'deepbox/ndarray';
import { mul, add } from 'deepbox/ndarray';

// Conceptual example (internal API)
function customOp(input: GradTensor): GradTensor {
  const outTensor = /* ... compute forward ... */;
  const requiresGrad = input.requiresGrad;

  return GradTensor.create({
    tensor: outTensor,
    requiresGrad,
    prev: requiresGrad ? [input] : [],
    backward: () => {
      if (!requiresGrad) return;
      const grad = /* ... compute gradient ... */;
      input.accumulateGrad(grad);
    },
  });
}

Most users don’t need to implement custom gradients. Deepbox provides gradients for all built-in operations.

Gradient Behavior Notes

Max/Min Tie-Breaking

When multiple elements share the max/min value, all tied positions receive the full gradient:

import { parameter } from 'deepbox/ndarray';

const x = parameter([1, 3, 3, 2]);
const y = x.max();
y.backward();

console.log(x.grad?.toString());
// Tensor([0, 1, 1, 0], shape=[4], dtype=float32)
// Both positions with value 3 get gradient 1

Gradient magnitude is multiplied by the tie count. This differs from some frameworks that divide gradients among ties.

Broadcasting in Gradients

Gradients are automatically reduced to match input shapes:

import { parameter, GradTensor, tensor } from 'deepbox/ndarray';

const a = parameter([[1], [2]]);
const b = GradTensor.fromTensor(tensor([[1, 2, 3]]), { requiresGrad: false });

// Broadcasting: [2,1] + [1,3] → [2,3]
const c = a.add(b);
const loss = c.sum();

loss.backward();

console.log(a.shape);       // [2, 1]
console.log(a.grad?.shape); // [2, 1] (reduced from [2, 3])

Training Example

Putting it all together:

import { parameter, GradTensor, tensor } from 'deepbox/ndarray';

// Initialize parameters
const W = parameter([[0.1, 0.2], [0.3, 0.4]]);
const b = parameter([0, 0]);

// Training data
const X = GradTensor.fromTensor(
  tensor([[1, 2], [3, 4], [5, 6]]),
  { requiresGrad: false }
);
const y_true = GradTensor.fromTensor(
  tensor([[1], [2], [3]]),
  { requiresGrad: false }
);

// Training loop
for (let epoch = 0; epoch < 10; epoch++) {
  // Forward pass
  const logits = X.matmul(W).add(b);  // Broadcasting
  const loss = logits.sub(y_true).pow(2).mean();

  // Backward pass
  W.zeroGrad();
  b.zeroGrad();
  loss.backward();

  // Manual gradient descent (0.01 learning rate)
  const lr = 0.01;
  // W -= lr * W.grad (conceptually)
  // In practice, use an optimizer

  console.log(`Epoch ${epoch}, Loss: ${loss.tensor.data[0]}`);
}

For real training, use optimizers from deepbox/nn (SGD, Adam, etc.) instead of manual parameter updates.

Get Started

Core Concepts

Modules

​Automatic Differentiation

​Overview

​GradTensor vs Tensor

​Creating GradTensors

​Using parameter()

​From Existing Tensors

​Creating Scalars

​Gradient Computation

​Basic Example

​Multi-Variable Gradients

​Chained Operations

​Gradient Management

​Zero Gradients

​Setting Gradients Manually

​Detaching from Graph

​Disabling Gradient Tracking

​Using noGrad()

​Module-Level State

​Supported Operations

​Arithmetic

​Mathematical Functions

​Activations

​Reductions

​Linear Algebra

​Shape Operations

​Indexing

​Advanced: Custom Gradients

​Gradient Behavior Notes

​Max/Min Tie-Breaking

​Broadcasting in Gradients

​Training Example

​Next Steps

Neural Networks

Optimizers

​API Reference

GradTensor API

Build docs developers (and LLMs) love

Automatic Differentiation

Overview

GradTensor vs Tensor

Creating GradTensors

Using `parameter()`

From Existing Tensors

Creating Scalars

Gradient Computation

Basic Example

Multi-Variable Gradients

Chained Operations

Gradient Management

Zero Gradients

Setting Gradients Manually

Detaching from Graph

Disabling Gradient Tracking

Using `noGrad()`

Module-Level State

Supported Operations

Arithmetic

Mathematical Functions

Activations

Reductions

Linear Algebra

Shape Operations

Indexing

Advanced: Custom Gradients

Gradient Behavior Notes

Max/Min Tie-Breaking

Broadcasting in Gradients

Training Example

Next Steps

API Reference