Skip to main content

Overview

Dataset generators create synthetic data with controlled properties. All generators support reproducible output via the randomState parameter.

Classification Generators

makeClassification

Generate a random n-class classification dataset.
function makeClassification(options?: {
  nSamples?: number;
  nFeatures?: number;
  nInformative?: number;
  nRedundant?: number;
  nClasses?: number;
  flipY?: number;
  randomState?: number;
}): [Tensor, Tensor]
options.nSamples
number
Number of samples (default: 100)
options.nFeatures
number
Total number of features (default: 20)
options.nInformative
number
Number of informative features (default: 2)
options.nRedundant
number
Number of redundant features (default: 2)
options.nClasses
number
Number of classes (default: 2)
options.flipY
number
Fraction of labels to flip for noise (default: 0.01)
options.randomState
number
Seed for reproducibility
Returns
[Tensor, Tensor]
Tuple of [X, y] where X has shape [nSamples, nFeatures] and y has shape [nSamples] (int32)
Produces informative features drawn from class-conditional Gaussians, redundant features as random linear combinations of the informative ones, and noise features sampled from N(0, 1). Example:
import { makeClassification } from 'deepbox/datasets';

const [X, y] = makeClassification({
  nSamples: 1000,
  nFeatures: 20,
  nInformative: 10,
  nRedundant: 5,
  nClasses: 3,
  randomState: 42
});
console.log(X.shape);  // [1000, 20]
console.log(y.shape);  // [1000]

makeBlobs

Generate isotropic Gaussian blobs for clustering.
function makeBlobs(options?: {
  nSamples?: number;
  nFeatures?: number;
  centers?: number | number[][];
  clusterStd?: number;
  randomState?: number;
  shuffle?: boolean;
}): [Tensor, Tensor]
options.nSamples
number
Total number of samples (default: 100)
options.nFeatures
number
Number of features per sample (default: 2). Ignored when centers is an array.
options.centers
number | number[][]
Number of cluster centers or explicit center coordinates (default: 3)
options.clusterStd
number
Standard deviation of each cluster (default: 1.0)
options.shuffle
boolean
Whether to shuffle the samples (default: true)
options.randomState
number
Seed for reproducibility
Returns
[Tensor, Tensor]
Tuple of [X, y] where X has shape [nSamples, nFeatures] and y has shape [nSamples] (int32)
Samples are drawn from Gaussian distributions centered at randomly generated or user-specified locations. Example:
import { makeBlobs } from 'deepbox/datasets';

// Random centers
const [X, y] = makeBlobs({
  nSamples: 300,
  centers: 3,
  clusterStd: 0.5,
  randomState: 42
});

// Explicit centers
const [X2, y2] = makeBlobs({
  nSamples: 300,
  centers: [[0, 0], [5, 5], [5, 0]],
  clusterStd: 0.8
});

makeMoons

Generate two interleaving half-circle (moons) dataset.
function makeMoons(options?: {
  nSamples?: number;
  noise?: number;
  shuffle?: boolean;
  randomState?: number;
}): [Tensor, Tensor]
options.nSamples
number
Total number of samples, split evenly between the two moons (default: 100)
options.noise
number
Standard deviation of Gaussian noise (default: 0)
options.shuffle
boolean
Whether to shuffle the samples (default: true)
options.randomState
number
Seed for reproducibility
Returns
[Tensor, Tensor]
Tuple of [X, y] where X has shape [nSamples, 2] and y has shape [nSamples] (int32)
Useful for testing algorithms that handle non-linearly separable data. Example:
import { makeMoons } from 'deepbox/datasets';

const [X, y] = makeMoons({
  nSamples: 200,
  noise: 0.1,
  randomState: 42
});

makeCircles

Generate a large circle containing a smaller circle in 2D.
function makeCircles(options?: {
  nSamples?: number;
  noise?: number;
  factor?: number;
  shuffle?: boolean;
  randomState?: number;
}): [Tensor, Tensor]
options.nSamples
number
Total number of samples, split evenly between inner and outer circles (default: 100)
options.noise
number
Standard deviation of Gaussian noise (default: 0)
options.factor
number
Scale factor between inner and outer circle, must be in (0, 1) (default: 0.8)
options.shuffle
boolean
Whether to shuffle the samples (default: true)
options.randomState
number
Seed for reproducibility
Returns
[Tensor, Tensor]
Tuple of [X, y] where X has shape [nSamples, 2] and y has shape [nSamples] (int32)
Useful for testing algorithms that handle non-linearly separable data. Example:
import { makeCircles } from 'deepbox/datasets';

const [X, y] = makeCircles({
  nSamples: 200,
  noise: 0.05,
  factor: 0.5,
  randomState: 42
});

makeGaussianQuantiles

Generate a dataset with classes separated by concentric Gaussian quantile shells.
function makeGaussianQuantiles(options?: {
  nSamples?: number;
  nFeatures?: number;
  nClasses?: number;
  randomState?: number;
}): [Tensor, Tensor]
options.nSamples
number
Number of samples (default: 100)
options.nFeatures
number
Number of features (default: 2)
options.nClasses
number
Number of classes (default: 3)
options.randomState
number
Seed for reproducibility
Returns
[Tensor, Tensor]
Tuple of [X, y] where X has shape [nSamples, nFeatures] and y has shape [nSamples] (int32)
Samples are drawn from an isotropic Gaussian and assigned to classes based on quantile boundaries of their Euclidean distance from the origin. Example:
import { makeGaussianQuantiles } from 'deepbox/datasets';

const [X, y] = makeGaussianQuantiles({
  nSamples: 500,
  nFeatures: 3,
  nClasses: 4,
  randomState: 42
});

Regression Generators

makeRegression

Generate a random regression dataset.
function makeRegression(options?: {
  nSamples?: number;
  nFeatures?: number;
  noise?: number;
  randomState?: number;
}): [Tensor, Tensor]
options.nSamples
number
Number of samples (default: 100)
options.nFeatures
number
Number of features (default: 100)
options.noise
number
Standard deviation of Gaussian noise on the target (default: 0)
options.randomState
number
Seed for reproducibility
Returns
[Tensor, Tensor]
Tuple of [X, y] where X has shape [nSamples, nFeatures] and y has shape [nSamples]
Features are drawn from N(0, 1) and the target is a linear combination of the features with optional Gaussian noise. Example:
import { makeRegression } from 'deepbox/datasets';

const [X, y] = makeRegression({
  nSamples: 1000,
  nFeatures: 10,
  noise: 5.0,
  randomState: 42
});
console.log(X.shape);  // [1000, 10]
console.log(y.shape);  // [1000]

Build docs developers (and LLMs) love