Neural Networks Module

The Neural Networks (nn) module provides building blocks for constructing deep learning models. It includes layers, activation functions, loss functions, and a modular architecture for building custom neural networks with automatic differentiation.

Overview

The nn module offers everything needed to build and train neural networks:

Core Layers: Linear (Dense), Convolutional, Recurrent (LSTM, GRU)
Activation Functions: ReLU, Sigmoid, Tanh, Softmax, and variants
Normalization: BatchNorm, LayerNorm
Regularization: Dropout
Attention: Multi-head Attention, Transformer layers
Loss Functions: Cross-entropy, MSE, MAE, and more
Module System: Base class for custom layers and models

Key Features

PyTorch-like API

Familiar Module-based architecture with forward hooks.

Automatic Differentiation

Built-in gradient computation with GradTensor.

Modular Design

Compose layers into complex architectures.

Modern Architectures

Transformers, attention, and recurrent networks.

Building Neural Networks

Basic Module

All neural network components inherit from the Module base class:

import { Module } from 'deepbox/nn';
import { GradTensor, parameter } from 'deepbox/ndarray';

class CustomLayer extends Module {
  weight: GradTensor;
  bias: GradTensor;
  
  constructor(inFeatures: number, outFeatures: number) {
    super();
    this.weight = parameter([inFeatures, outFeatures]);
    this.bias = parameter([outFeatures]);
  }
  
  forward(x: GradTensor): GradTensor {
    return x.matmul(this.weight).add(this.bias);
  }
}

Sequential Model

import { Sequential, Linear, ReLU, Softmax } from 'deepbox/nn';

// Build a simple feedforward network
const model = new Sequential([
  new Linear(784, 128),
  new ReLU(),
  new Linear(128, 64),
  new ReLU(),
  new Linear(64, 10),
  new Softmax()
]);

// Forward pass
const output = model.forward(input);

Core Layers

Linear (Dense) Layer

import { Linear } from 'deepbox/nn';
import { GradTensor } from 'deepbox/ndarray';

const layer = new Linear(10, 5);  // 10 inputs, 5 outputs

const x = new GradTensor([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]);
const output = layer.forward(x);  // Shape: [1, 5]

Convolutional Layers

import { Conv1d, Conv2d, MaxPool2d, AvgPool2d } from 'deepbox/nn';

// 1D Convolution (for sequences)
const conv1d = new Conv1d({
  inChannels: 3,
  outChannels: 16,
  kernelSize: 3,
  stride: 1,
  padding: 1
});

// 2D Convolution (for images)
const conv2d = new Conv2d({
  inChannels: 3,
  outChannels: 32,
  kernelSize: 3,
  stride: 1,
  padding: 1
});

// Max Pooling
const maxpool = new MaxPool2d({
  kernelSize: 2,
  stride: 2
});

// Average Pooling
const avgpool = new AvgPool2d({ kernelSize: 2 });

Recurrent Layers

import { LSTM, GRU, RNN } from 'deepbox/nn';

// LSTM for sequence modeling
const lstm = new LSTM({
  inputSize: 10,
  hiddenSize: 20,
  numLayers: 2,
  dropout: 0.2
});

const { output, hidden, cell } = lstm.forward(sequenceInput);

// GRU (simpler than LSTM)
const gru = new GRU({
  inputSize: 10,
  hiddenSize: 20
});

// Basic RNN
const rnn = new RNN({
  inputSize: 10,
  hiddenSize: 20
});

Activation Functions

import { 
  ReLU, 
  LeakyReLU, 
  ELU, 
  GELU,
  Sigmoid, 
  Tanh,
  Softmax,
  LogSoftmax 
} from 'deepbox/nn';

// ReLU family
const relu = new ReLU();
const leakyRelu = new LeakyReLU(0.01);
const elu = new ELU(1.0);
const gelu = new GELU();

// Sigmoid and Tanh
const sigmoid = new Sigmoid();
const tanh = new Tanh();

// For classification
const softmax = new Softmax();
const logSoftmax = new LogSoftmax();

// Usage
const activated = relu.forward(x);

Normalization Layers

import { BatchNorm1d, LayerNorm } from 'deepbox/nn';

// Batch Normalization
const batchNorm = new BatchNorm1d({
  numFeatures: 64,
  eps: 1e-5,
  momentum: 0.1
});

// Layer Normalization (better for RNNs)
const layerNorm = new LayerNorm({
  normalizedShape: [64],
  eps: 1e-5
});

// Training mode
batchNorm.train();
const normedTrain = batchNorm.forward(x);

// Evaluation mode
batchNorm.eval();
const normedEval = batchNorm.forward(x);

Regularization

import { Dropout } from 'deepbox/nn';

const dropout = new Dropout(0.5);  // Drop 50% of neurons

// During training
dropout.train();
const dropped = dropout.forward(x);

// During inference (no dropout)
dropout.eval();
const unchanged = dropout.forward(x);

Attention Mechanisms

Multi-head Attention

import { MultiheadAttention } from 'deepbox/nn';

const attention = new MultiheadAttention({
  embedDim: 512,
  numHeads: 8,
  dropout: 0.1
});

const { output, attentionWeights } = attention.forward({
  query: q,
  key: k,
  value: v
});

Transformer Encoder Layer

import { TransformerEncoderLayer } from 'deepbox/nn';

const encoder = new TransformerEncoderLayer({
  dModel: 512,
  nHead: 8,
  dimFeedforward: 2048,
  dropout: 0.1
});

const encoded = encoder.forward(x);

Loss Functions

import { 
  mseLoss, 
  maeLoss,
  crossEntropyLoss,
  binaryCrossEntropyLoss,
  huberLoss 
} from 'deepbox/nn';
import { GradTensor } from 'deepbox/ndarray';

// Mean Squared Error (regression)
const predictions = new GradTensor([2.5, 3.0, 4.5]);
const targets = new GradTensor([2.0, 3.5, 4.0]);
const mseLoss_ = mseLoss(predictions, targets);

// Cross-Entropy (classification)
const logits = new GradTensor([[2.0, 1.0, 0.1]]);
const labels = new GradTensor([0]);  // Class 0
const ceLoss = crossEntropyLoss(logits, labels);

// Binary Cross-Entropy
const bce = binaryCrossEntropyLoss(predictions, targets);

// Mean Absolute Error
const mae = maeLoss(predictions, targets);

// Huber Loss (robust to outliers)
const huber = huberLoss(predictions, targets, { delta: 1.0 });

Complete Example: Image Classification

import { 
  Sequential, 
  Conv2d, 
  MaxPool2d, 
  Linear, 
  ReLU, 
  Softmax,
  Dropout,
  crossEntropyLoss 
} from 'deepbox/nn';
import { GradTensor } from 'deepbox/ndarray';
import { Adam } from 'deepbox/optim';

// Define CNN architecture
class CNN extends Sequential {
  constructor() {
    super([
      // Conv block 1
      new Conv2d({ inChannels: 3, outChannels: 32, kernelSize: 3, padding: 1 }),
      new ReLU(),
      new MaxPool2d({ kernelSize: 2 }),
      
      // Conv block 2
      new Conv2d({ inChannels: 32, outChannels: 64, kernelSize: 3, padding: 1 }),
      new ReLU(),
      new MaxPool2d({ kernelSize: 2 }),
      
      // Fully connected
      new Linear(64 * 8 * 8, 128),
      new ReLU(),
      new Dropout(0.5),
      new Linear(128, 10),
      new Softmax()
    ]);
  }
}

const model = new CNN();
const optimizer = new Adam(model.parameters(), { lr: 0.001 });

// Training loop
for (let epoch = 0; epoch < 10; epoch++) {
  for (const { images, labels } of trainLoader) {
    optimizer.zeroGrad();
    
    const output = model.forward(images);
    const loss = crossEntropyLoss(output, labels);
    
    loss.backward();
    optimizer.step();
  }
}

Module Methods

Parameter Management

import { Module } from 'deepbox/nn';

const model = new Sequential([...]);

// Get all parameters
const params = model.parameters();

// Count parameters
let totalParams = 0;
for (const param of params) {
  totalParams += param.size;
}

// Get named parameters
const namedParams = model.namedParameters();

Training vs Evaluation Mode

// Training mode (dropout, batchnorm active)
model.train();

// Evaluation mode (no dropout, batchnorm uses running stats)
model.eval();

Hooks

import { Module, type ForwardHook } from 'deepbox/nn';

const hook: ForwardHook = (module, input, output) => {
  console.log('Layer output shape:', output.shape);
};

// Register forward hook
const handle = layer.registerForwardHook(hook);

// Remove hook later
handle.remove();

Use Cases

Image Classification

Build CNNs for image recognition:

import { Sequential, Conv2d, MaxPool2d, Linear, ReLU } from 'deepbox/nn';

const model = new Sequential([
  new Conv2d({ inChannels: 3, outChannels: 16, kernelSize: 3 }),
  new ReLU(),
  new MaxPool2d({ kernelSize: 2 }),
  new Linear(16 * 14 * 14, 10)
]);

Sequence Modeling

Use LSTMs for time series or text:

import { LSTM, Linear } from 'deepbox/nn';

class SequenceModel extends Module {
  lstm = new LSTM({ inputSize: 50, hiddenSize: 100 });
  fc = new Linear(100, 1);
  
  forward(x: GradTensor): GradTensor {
    const { output } = this.lstm.forward(x);
    return this.fc.forward(output);
  }
}

Transformer for NLP

Build transformer models:

import { TransformerEncoderLayer, Linear } from 'deepbox/nn';

class Transformer extends Module {
  encoder = new TransformerEncoderLayer({
    dModel: 512,
    nHead: 8,
    dimFeedforward: 2048
  });
  classifier = new Linear(512, 2);
  
  forward(x: GradTensor): GradTensor {
    const encoded = this.encoder.forward(x);
    return this.classifier.forward(encoded);
  }
}

Best Practices

Initialize weights properly. Most layers use Xavier/He initialization by default.

Use BatchNorm or LayerNorm to stabilize training and enable higher learning rates.

Add Dropout for regularization, especially before fully connected layers.

Always call model.train() before training and model.eval() before evaluation to ensure proper behavior of Dropout and BatchNorm.

Optimization

Optimizers and learning rate schedulers

NDArray

Tensor operations and gradients

Machine Learning

Classical ML algorithms

Learn More

API Reference

Complete API documentation

Tutorial

Build your first neural network

Get Started

Core Concepts

Modules

Neural Networks Module

Overview

Key Features

PyTorch-like API

Automatic Differentiation

Modular Design

Modern Architectures

Building Neural Networks

Basic Module

Sequential Model

Core Layers

Linear (Dense) Layer

Convolutional Layers

Recurrent Layers

Activation Functions

Normalization Layers

Regularization

Attention Mechanisms

Multi-head Attention

Transformer Encoder Layer

Loss Functions

Complete Example: Image Classification

Module Methods

Parameter Management

Training vs Evaluation Mode

Hooks

Use Cases

Best Practices

Optimization

NDArray

Machine Learning

Learn More

API Reference

Tutorial

Build docs developers (and LLMs) love

Get Started

Core Concepts

Modules

​Overview

​Key Features

PyTorch-like API

Automatic Differentiation

Modular Design

Modern Architectures

​Building Neural Networks

​Basic Module

​Sequential Model

​Core Layers

​Linear (Dense) Layer

​Convolutional Layers

​Recurrent Layers

​Activation Functions

​Normalization Layers

​Regularization

​Attention Mechanisms

​Multi-head Attention

​Transformer Encoder Layer

​Loss Functions

​Complete Example: Image Classification

​Module Methods

​Parameter Management

​Training vs Evaluation Mode

​Hooks

​Use Cases

​Best Practices

​Related Modules

Optimization

NDArray

Machine Learning

​Learn More

API Reference

Tutorial

Build docs developers (and LLMs) love

Overview

Key Features

Building Neural Networks

Basic Module

Sequential Model

Core Layers

Linear (Dense) Layer

Convolutional Layers

Recurrent Layers

Activation Functions

Normalization Layers

Regularization

Attention Mechanisms

Multi-head Attention

Transformer Encoder Layer

Loss Functions

Complete Example: Image Classification

Module Methods

Parameter Management

Training vs Evaluation Mode

Hooks

Use Cases

Best Practices

Related Modules

Learn More