Automatic Differentiation
Deepbox implements reverse-mode automatic differentiation (backpropagation) through the GradTensor class. This enables gradient-based optimization for machine learning without manual derivative calculations.
Overview
Automatic differentiation (autograd) tracks operations on tensors to build a computation graph , then computes gradients via the chain rule during a backward pass.
import { parameter } from 'deepbox/ndarray' ;
// Create a parameter that requires gradients
const x = parameter ([ 2 , 3 , 4 ]);
// Compute: f(x) = sum(x²)
const y = x . mul ( x ). sum ();
// Compute gradients: df/dx = 2x
y . backward ();
console . log ( x . grad ?. toString ());
// Tensor([4, 6, 8], shape=[3], dtype=float32)
Autograd only tracks gradients for numeric dtypes (float32, float64, int32, uint8, bool). String tensors and int64 (BigInt) are not supported for differentiation.
GradTensor vs Tensor
Feature TensorGradTensorPurpose Numerical arrays Differentiable arrays Gradient tracking No Yes (when requiresGrad=true) Computation graph Not recorded Recorded for backward pass Use case Data, inference Training, optimization Memory overhead Lower Higher (stores graph)
Use Tensor for inference and data processing. Use GradTensor only when training models or computing derivatives.
Creating GradTensors
Using parameter()
The parameter() function creates a GradTensor with requiresGrad=true:
import { parameter } from 'deepbox/ndarray' ;
// Scalar parameter
const bias = parameter ([ 0.5 ]);
// Weight matrix
const W = parameter ([
[ 0.1 , 0.2 ],
[ 0.3 , 0.4 ],
]);
console . log ( W . requiresGrad ); // true
From Existing Tensors
Convert a Tensor to GradTensor:
import { GradTensor , tensor } from 'deepbox/ndarray' ;
const t = tensor ([ 1 , 2 , 3 ]);
const gt = GradTensor . fromTensor ( t , { requiresGrad: true });
console . log ( gt . requiresGrad ); // true
Creating Scalars
import { GradTensor } from 'deepbox/ndarray' ;
const learningRate = GradTensor . scalar ( 0.01 , {
requiresGrad: false ,
dtype: 'float32' ,
});
console . log ( learningRate . shape ); // []
Gradient Computation
Basic Example
import { parameter } from 'deepbox/ndarray' ;
// f(x) = x²
const x = parameter ([ 2 , 3 , 4 ]);
const y = x . mul ( x );
// Sum to get scalar loss
const loss = y . sum ();
// Compute gradients
loss . backward ();
console . log ( 'x:' , x . tensor . toString ());
// Tensor([2, 3, 4], shape=[3], dtype=float32)
console . log ( 'f(x):' , y . tensor . toString ());
// Tensor([4, 9, 16], shape=[3], dtype=float32)
console . log ( 'df/dx:' , x . grad ?. toString ());
// Tensor([4, 6, 8], shape=[3], dtype=float32)
// df/dx = 2x → [4, 6, 8]
Multi-Variable Gradients
Compute gradients for multiple parameters:
import { parameter } from 'deepbox/ndarray' ;
const a = parameter ([
[ 1 , 2 ],
[ 3 , 4 ],
]);
const w = parameter ([[ 0.5 ], [ 0.5 ]]);
// y = sum(a @ w)
const z = a . matmul ( w ). sum ();
z . backward ();
console . log ( 'dz/da:' , a . grad ?. toString ());
// Gradient with respect to a
console . log ( 'dz/dw:' , w . grad ?. toString ());
// Gradient with respect to w
Chained Operations
Autograd automatically handles complex computation graphs:
import { parameter , GradTensor , tensor } from 'deepbox/ndarray' ;
const p = parameter ([ 1 , 2 , 3 , 4 ]);
// f(p) = sum(relu(2p - 3))
const two = GradTensor . fromTensor ( tensor ([ 2 , 2 , 2 , 2 ]), {
requiresGrad: false ,
});
const three = GradTensor . fromTensor ( tensor ([ 3 , 3 , 3 , 3 ]), {
requiresGrad: false ,
});
const scaled = p . mul ( two ); // 2p
const shifted = scaled . sub ( three ); // 2p - 3
const activated = shifted . relu (); // relu(2p - 3)
const loss = activated . sum (); // sum(...)
loss . backward ();
console . log ( 'p:' , p . tensor . toString ());
// Tensor([1, 2, 3, 4], shape=[4], dtype=float32)
console . log ( 'activated:' , activated . tensor . toString ());
// Tensor([0, 1, 3, 5], shape=[4], dtype=float32)
// relu sets negatives to zero
console . log ( 'grad:' , p . grad ?. toString ());
// Tensor([0, 2, 2, 2], shape=[4], dtype=float32)
// Gradient is 2 where input > 1.5, else 0
Gradient Management
Zero Gradients
Gradients accumulate by default. Clear them between training steps:
import { parameter } from 'deepbox/ndarray' ;
const v = parameter ([ 1 , 2 , 3 ]);
// First backward pass
const loss1 = v . mul ( v ). sum ();
loss1 . backward ();
console . log ( v . grad ?. toString ());
// Tensor([2, 4, 6], shape=[3], dtype=float32)
// Zero gradients before next pass
v . zeroGrad ();
console . log ( v . grad ?. toString ());
// Tensor([0, 0, 0], shape=[3], dtype=float32)
// Second backward pass
const loss2 = v . sum ();
loss2 . backward ();
console . log ( v . grad ?. toString ());
// Tensor([1, 1, 1], shape=[3], dtype=float32)
Always call zeroGrad() before computing gradients for a new batch, or gradients will accumulate from previous iterations.
Setting Gradients Manually
import { parameter , tensor } from 'deepbox/ndarray' ;
const x = parameter ([ 1 , 2 , 3 ]);
// Set custom gradient
const customGrad = tensor ([ 0.1 , 0.2 , 0.3 ]);
x . setGrad ( customGrad );
console . log ( x . grad ?. toString ());
// Tensor([0.1, 0.2, 0.3], shape=[3], dtype=float32)
Detaching from Graph
Create a copy without gradient tracking:
import { parameter } from 'deepbox/ndarray' ;
const x = parameter ([ 1 , 2 , 3 ]);
const y = x . mul ( x );
// Detach: stop tracking gradients
const detached = y . detach ();
console . log ( x . requiresGrad ); // true
console . log ( detached . requiresGrad ); // false
Disabling Gradient Tracking
Using noGrad()
Temporarily disable gradient tracking for inference:
import { parameter , noGrad } from 'deepbox/ndarray' ;
const q = parameter ([ 1 , 2 , 3 ]);
noGrad (() => {
// Operations inside do NOT track gradients
const result = q . mul ( q );
console . log ( result . requiresGrad ); // false
});
// Back to normal gradient tracking
const result2 = q . mul ( q );
console . log ( result2 . requiresGrad ); // true
noGrad() only accepts synchronous callbacks. Passing an async function throws an error because the gradient flag would restore before async work completes.
Module-Level State
Gradient tracking is controlled by a module-level singleton gradEnabled:
// Conceptual (internal state):
let gradEnabled = true ;
// noGrad() temporarily sets it to false
noGrad (() => {
// gradEnabled = false
// ...
});
// gradEnabled = true again
Because JavaScript is single-threaded, this global flag is safe in synchronous code. Do not rely on it across async boundaries.
Supported Operations
GradTensor supports automatic differentiation for:
Arithmetic
const a = parameter ([ 1 , 2 ]);
const b = parameter ([ 3 , 4 ]);
a . add ( b ) // Addition
a . sub ( b ) // Subtraction
a . mul ( b ) // Multiplication
a . div ( b ) // Division
a . neg () // Negation
Mathematical Functions
const x = parameter ([ 1 , 2 , 3 ]);
x . pow ( 2 ) // Power (x²)
x . sqrt () // Square root
x . exp () // Exponential
x . log () // Natural logarithm
x . abs () // Absolute value
x . clip ( 0 , 5 ) // Clipping
Activations
const x = parameter ([ 1 , 2 , 3 ]);
x . relu () // ReLU: max(0, x)
x . sigmoid () // Sigmoid: 1 / (1 + e^-x)
x . tanh () // Hyperbolic tangent
Reductions
const x = parameter ([[ 1 , 2 ], [ 3 , 4 ]]);
x . sum () // Sum all elements
x . sum ( 0 ) // Sum along axis 0
x . sum ( 0 , true ) // Sum with keepdims=true
x . mean () // Mean
x . max () // Maximum
x . min () // Minimum
Linear Algebra
const a = parameter ([[ 1 , 2 ], [ 3 , 4 ]]);
const b = parameter ([[ 5 , 6 ], [ 7 , 8 ]]);
a . matmul ( b ) // Matrix multiplication
Shape Operations
const x = parameter ([ 1 , 2 , 3 , 4 , 5 , 6 ]);
x . reshape ([ 2 , 3 ]) // Reshape
x . flatten () // Flatten to 1D
x . transpose () // Transpose
x . view ([ 3 , 2 ]) // Create view
Indexing
const x = parameter ([ 1 , 2 , 3 , 4 , 5 ]);
const indices = parameter ([ 0 , 2 , 4 ]);
x . slice ({ start: 1 , end: 4 }) // Slice
x . gather ( indices , 0 ) // Gather
Advanced: Custom Gradients
For custom operations, define backward functions:
import { GradTensor } from 'deepbox/ndarray' ;
import { mul , add } from 'deepbox/ndarray' ;
// Conceptual example (internal API)
function customOp ( input : GradTensor ) : GradTensor {
const outTensor = /* ... compute forward ... */ ;
const requiresGrad = input . requiresGrad ;
return GradTensor . create ({
tensor: outTensor ,
requiresGrad ,
prev: requiresGrad ? [ input ] : [],
backward : () => {
if ( ! requiresGrad ) return ;
const grad = /* ... compute gradient ... */ ;
input . accumulateGrad ( grad );
},
});
}
Most users don’t need to implement custom gradients. Deepbox provides gradients for all built-in operations.
Gradient Behavior Notes
Max/Min Tie-Breaking
When multiple elements share the max/min value, all tied positions receive the full gradient :
import { parameter } from 'deepbox/ndarray' ;
const x = parameter ([ 1 , 3 , 3 , 2 ]);
const y = x . max ();
y . backward ();
console . log ( x . grad ?. toString ());
// Tensor([0, 1, 1, 0], shape=[4], dtype=float32)
// Both positions with value 3 get gradient 1
Gradient magnitude is multiplied by the tie count. This differs from some frameworks that divide gradients among ties.
Broadcasting in Gradients
Gradients are automatically reduced to match input shapes:
import { parameter , GradTensor , tensor } from 'deepbox/ndarray' ;
const a = parameter ([[ 1 ], [ 2 ]]);
const b = GradTensor . fromTensor ( tensor ([[ 1 , 2 , 3 ]]), { requiresGrad: false });
// Broadcasting: [2,1] + [1,3] → [2,3]
const c = a . add ( b );
const loss = c . sum ();
loss . backward ();
console . log ( a . shape ); // [2, 1]
console . log ( a . grad ?. shape ); // [2, 1] (reduced from [2, 3])
Training Example
Putting it all together:
import { parameter , GradTensor , tensor } from 'deepbox/ndarray' ;
// Initialize parameters
const W = parameter ([[ 0.1 , 0.2 ], [ 0.3 , 0.4 ]]);
const b = parameter ([ 0 , 0 ]);
// Training data
const X = GradTensor . fromTensor (
tensor ([[ 1 , 2 ], [ 3 , 4 ], [ 5 , 6 ]]),
{ requiresGrad: false }
);
const y_true = GradTensor . fromTensor (
tensor ([[ 1 ], [ 2 ], [ 3 ]]),
{ requiresGrad: false }
);
// Training loop
for ( let epoch = 0 ; epoch < 10 ; epoch ++ ) {
// Forward pass
const logits = X . matmul ( W ). add ( b ); // Broadcasting
const loss = logits . sub ( y_true ). pow ( 2 ). mean ();
// Backward pass
W . zeroGrad ();
b . zeroGrad ();
loss . backward ();
// Manual gradient descent (0.01 learning rate)
const lr = 0.01 ;
// W -= lr * W.grad (conceptually)
// In practice, use an optimizer
console . log ( `Epoch ${ epoch } , Loss: ${ loss . tensor . data [ 0 ] } ` );
}
For real training, use optimizers from deepbox/nn (SGD, Adam, etc.) instead of manual parameter updates.
Next Steps
Neural Networks Build models with layers and optimizers
Optimizers Gradient descent, Adam, RMSprop, and more
API Reference
GradTensor API Complete API documentation for GradTensor