Skip to main content

PCA

Principal Component Analysis (PCA). Linear dimensionality reduction using Singular Value Decomposition (SVD) to project data to a lower dimensional space. Algorithm:
  1. Center the data by subtracting the mean
  2. Compute SVD: X = U * Σ * V^T
  3. Principal components are columns of V
  4. Transform data by projecting onto principal components
Time Complexity: O(min(nd^2, dn^2)) where n=samples, d=features

Constructor

new PCA(options?: {
  nComponents?: number;
  whiten?: boolean;
})
options.nComponents
number
Number of components to keep. If undefined, keeps min(n_samples, n_features).
options.whiten
boolean
default:"false"
Whether to whiten the data. When true, the components are divided by the square root of the explained variance, ensuring unit variance.

Methods

fit

fit(X: Tensor, y?: Tensor): this
Fit PCA on training data.
X
Tensor
required
Training data of shape (n_samples, n_features)
Returns: The fitted estimator

transform

transform(X: Tensor): Tensor
Transform data to principal component space.
X
Tensor
required
Data of shape (n_samples, n_features)
Returns: Transformed data of shape (n_samples, n_components)

fitTransform

fitTransform(X: Tensor, y?: Tensor): Tensor
Fit and transform in one step. Returns: Transformed data

inverseTransform

inverseTransform(X: Tensor): Tensor
Transform data back to original space.
X
Tensor
required
Transformed data of shape (n_samples, n_components)
Returns: Reconstructed data of shape (n_samples, n_features)

Properties

components
Tensor
Principal components of shape (n_components, n_features)
explainedVariance
Tensor
Amount of variance explained by each component
explainedVarianceRatio
Tensor
Percentage of variance explained by each component

Example

import { PCA } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';

const X = tensor([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0]]);
const pca = new PCA({ nComponents: 1 });
pca.fit(X);

const XTransformed = pca.transform(X);
console.log('Explained variance ratio:', pca.explainedVarianceRatio);

TSNE

t-Distributed Stochastic Neighbor Embedding (t-SNE). A nonlinear dimensionality reduction technique for embedding high-dimensional data into a low-dimensional space (typically 2D or 3D) for visualization. Algorithm: Exact t-SNE with an optional sampling-based approximation
  • Computes pairwise affinities in high-dimensional space using Gaussian kernel (exact)
  • Computes pairwise affinities in low-dimensional space using Student-t distribution
  • Minimizes KL divergence between the two distributions
Scalability Note: Exact t-SNE is O(n^2) in time and memory. For large datasets, use method: "approximate" (sampled neighbors + negative sampling) or reduce samples.

Constructor

new TSNE(options?: {
  nComponents?: number;
  perplexity?: number;
  learningRate?: number;
  nIter?: number;
  earlyExaggeration?: number;
  earlyExaggerationIter?: number;
  randomState?: number;
  minGradNorm?: number;
  method?: "exact" | "approximate";
  maxExactSamples?: number;
  approximateNeighbors?: number;
  negativeSamples?: number;
})
options.nComponents
number
default:"2"
Number of dimensions in the embedding (typically 2 or 3).
options.perplexity
number
default:"30"
Perplexity parameter (related to number of nearest neighbors). Should be between 5 and 50.
options.learningRate
number
default:"200"
Learning rate for gradient descent.
options.nIter
number
default:"1000"
Number of iterations.
options.earlyExaggeration
number
default:"12"
Early exaggeration factor. Helps form tight clusters.
options.earlyExaggerationIter
number
default:"250"
Number of iterations with early exaggeration.
options.randomState
number
Random seed for reproducibility.
options.minGradNorm
number
default:"1e-7"
Minimum gradient norm for convergence.
options.method
string
default:"exact"
Method for computing affinities: ‘exact’ (full pairwise) or ‘approximate’ (sampling for large datasets).
options.maxExactSamples
number
default:"2000"
Maximum samples allowed for exact mode before requiring approximate.
options.approximateNeighbors
number
Number of neighbors to sample per point in approximate mode. Default: max(5, floor(perplexity * 3)).
options.negativeSamples
number
Number of negative samples per point in approximate mode. Default: max(10, floor(perplexity * 2)).

Methods

fit

fit(X: Tensor): this
Fit the t-SNE model (same as fitTransform for t-SNE).

transform

transform(X?: Tensor): Tensor
Return the fitted embedding. For t-SNE, transform is equivalent to returning the already-computed embedding (t-SNE is non-parametric). Returns: Low-dimensional embedding of shape (n_samples, n_components)

fitTransform

fitTransform(X: Tensor): Tensor
Fit the t-SNE model and return the embedding.
X
Tensor
required
Training data of shape (n_samples, n_features)
Returns: Low-dimensional embedding of shape (n_samples, n_components) Throws: InvalidParameterError if perplexity >= n_samples or if exact mode used with too many samples

Properties

embeddingResult
Tensor
The fitted embedding after calling fit or fitTransform.

Example

import { TSNE } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';

const X = tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]);

const tsne = new TSNE({ nComponents: 2, perplexity: 5 });
const embedding = tsne.fitTransform(X);

console.log('2D embedding:', embedding);

Build docs developers (and LLMs) love