Machine Learning Module

The Machine Learning (ML) module provides scikit-learn style implementations of classical machine learning algorithms. It includes supervised learning (classification and regression), unsupervised learning (clustering and dimensionality reduction), and model selection utilities.

Overview

The ML module offers a comprehensive suite of machine learning algorithms:

Linear Models: Linear/Logistic Regression, Ridge, Lasso
Tree-Based: Decision Trees, Random Forests, Gradient Boosting
Support Vector Machines: LinearSVC, LinearSVR
Neighbors: K-Nearest Neighbors for classification and regression
Clustering: K-Means, DBSCAN
Dimensionality Reduction: PCA, t-SNE
Naive Bayes: Gaussian Naive Bayes classifier

Key Features

Scikit-learn API

Familiar fit/predict interface compatible with scikit-learn.

Complete Pipeline

From data preprocessing to model evaluation.

Ensemble Methods

Random Forests and Gradient Boosting for better accuracy.

TypeScript Native

Full type safety and modern JavaScript features.

Linear Models

Linear Regression

import { LinearRegression } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';

// Training data
const X = tensor([[1, 1], [1, 2], [2, 2], [2, 3]]);
const y = tensor([1, 2, 2, 3]);

// Create and fit model
const model = new LinearRegression({ fitIntercept: true });
model.fit(X, y);

// Make predictions
const X_test = tensor([[3, 5]]);
const predictions = model.predict(X_test);

// Evaluate
const score = model.score(X, y);  // R² score

Logistic Regression

import { LogisticRegression } from 'deepbox/ml';

const X = tensor([[1, 2], [2, 3], [3, 1], [4, 2]]);
const y = tensor([0, 0, 1, 1]);

const model = new LogisticRegression({ 
  penalty: 'l2',
  C: 1.0,
  maxIter: 100 
});

model.fit(X, y);
const predictions = model.predict(X);
const probabilities = model.predictProba(X);

Ridge and Lasso

import { Ridge, Lasso } from 'deepbox/ml';

// Ridge regression (L2 regularization)
const ridge = new Ridge({ alpha: 1.0 });
ridge.fit(X_train, y_train);

// Lasso regression (L1 regularization)
const lasso = new Lasso({ alpha: 0.1, maxIter: 1000 });
lasso.fit(X_train, y_train);

Tree-Based Models

Decision Trees

import { DecisionTreeClassifier, DecisionTreeRegressor } from 'deepbox/ml';

// Classification
const clf = new DecisionTreeClassifier({
  maxDepth: 5,
  minSamplesSplit: 2,
  minSamplesLeaf: 1
});

clf.fit(X_train, y_train);
const y_pred = clf.predict(X_test);

// Regression
const reg = new DecisionTreeRegressor({ maxDepth: 10 });
reg.fit(X_train, y_train);

Random Forest

import { RandomForestClassifier, RandomForestRegressor } from 'deepbox/ml';

// Random Forest Classifier
const rf = new RandomForestClassifier({
  nEstimators: 100,
  maxDepth: 10,
  minSamplesSplit: 2,
  randomState: 42
});

rf.fit(X_train, y_train);
const predictions = rf.predict(X_test);
const accuracy = rf.score(X_test, y_test);

// Feature importance
const importance = rf.featureImportances();

Gradient Boosting

import { GradientBoostingClassifier, GradientBoostingRegressor } from 'deepbox/ml';

// Gradient Boosting for classification
const gbc = new GradientBoostingClassifier({
  nEstimators: 100,
  learningRate: 0.1,
  maxDepth: 3,
  subsample: 0.8
});

gbc.fit(X_train, y_train);
const y_pred = gbc.predict(X_test);

// For regression
const gbr = new GradientBoostingRegressor({
  nEstimators: 100,
  learningRate: 0.1
});

Support Vector Machines

import { LinearSVC, LinearSVR } from 'deepbox/ml';

// Linear Support Vector Classifier
const svc = new LinearSVC({
  C: 1.0,
  maxIter: 1000,
  tol: 1e-4
});

svc.fit(X_train, y_train);
const predictions = svc.predict(X_test);

// Linear Support Vector Regressor
const svr = new LinearSVR({ C: 1.0, epsilon: 0.1 });
svr.fit(X_train, y_train);

K-Nearest Neighbors

import { KNeighborsClassifier, KNeighborsRegressor } from 'deepbox/ml';

// KNN Classifier
const knn_clf = new KNeighborsClassifier({
  nNeighbors: 5,
  weights: 'distance',
  metric: 'euclidean'
});

knn_clf.fit(X_train, y_train);
const y_pred = knn_clf.predict(X_test);

// KNN Regressor
const knn_reg = new KNeighborsRegressor({ nNeighbors: 3 });
knn_reg.fit(X_train, y_train);

Clustering

K-Means

import { KMeans } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';

const X = tensor([
  [1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]
]);

const kmeans = new KMeans({
  nClusters: 3,
  maxIter: 300,
  randomState: 42
});

kmeans.fit(X);

// Get cluster assignments
const labels = kmeans.labels();

// Get cluster centers
const centers = kmeans.clusterCenters();

// Predict cluster for new data
const newPoint = tensor([[0, 0]]);
const cluster = kmeans.predict(newPoint);

DBSCAN

import { DBSCAN } from 'deepbox/ml';

const dbscan = new DBSCAN({
  eps: 0.5,
  minSamples: 5,
  metric: 'euclidean'
});

dbscan.fit(X);
const labels = dbscan.labels();

// -1 indicates noise points
const corePoints = dbscan.corePointIndices();

Dimensionality Reduction

PCA (Principal Component Analysis)

import { PCA } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';

const X = tensor([
  [2.5, 2.4],
  [0.5, 0.7],
  [2.2, 2.9],
  [1.9, 2.2]
]);

const pca = new PCA({ nComponents: 1 });
pca.fit(X);

// Transform data to lower dimensions
const X_reduced = pca.transform(X);

// Get explained variance ratio
const variance = pca.explainedVarianceRatio();

// Get principal components
const components = pca.components();

t-SNE

import { TSNE } from 'deepbox/ml';

const tsne = new TSNE({
  nComponents: 2,
  perplexity: 30,
  learningRate: 200,
  nIter: 1000
});

const X_embedded = tsne.fitTransform(X);

Naive Bayes

import { GaussianNB } from 'deepbox/ml';

const gnb = new GaussianNB();
gnb.fit(X_train, y_train);

const predictions = gnb.predict(X_test);
const probabilities = gnb.predictProba(X_test);

Use Cases

Binary Classification

Classify data into two categories:

import { LogisticRegression } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';
import { accuracy } from 'deepbox/metrics';

// Spam detection example
const X_train = tensor([...]); // Features
const y_train = tensor([0, 1, 0, 1, ...]);  // 0=ham, 1=spam

const model = new LogisticRegression();
model.fit(X_train, y_train);

const y_pred = model.predict(X_test);
const acc = accuracy(y_test, y_pred);

Multi-class Classification

Classify into multiple categories:

import { RandomForestClassifier } from 'deepbox/ml';

// Iris species classification
const model = new RandomForestClassifier({ nEstimators: 100 });
model.fit(X_train, y_train);  // y has classes 0, 1, 2

const predictions = model.predict(X_test);
const probabilities = model.predictProba(X_test);

Customer Segmentation

Group customers by behavior:

import { KMeans } from 'deepbox/ml';
import { tensor } from 'deepbox/ndarray';

// Customer features: [age, income, spending_score]
const customers = tensor([...]);

const kmeans = new KMeans({ nClusters: 4 });
kmeans.fit(customers);

const segments = kmeans.labels();
const centers = kmeans.clusterCenters();

Feature Reduction

Reduce dimensionality while preserving information:

import { PCA } from 'deepbox/ml';

// High-dimensional data
const X = tensor([...]);  // Shape: [n_samples, 100]

const pca = new PCA({ nComponents: 10 });
pca.fit(X);

const X_reduced = pca.transform(X);  // Shape: [n_samples, 10]
console.log(pca.explainedVarianceRatio().sum());

Model Selection

All estimators follow the same interface:

interface Estimator {
  fit(X: Tensor, y?: Tensor): this;
}

interface Classifier extends Estimator {
  predict(X: Tensor): Tensor;
  predictProba(X: Tensor): Tensor;
  score(X: Tensor, y: Tensor): number;
}

interface Regressor extends Estimator {
  predict(X: Tensor): Tensor;
  score(X: Tensor, y: Tensor): number;  // R² score
}

interface Clusterer extends Estimator {
  predict(X: Tensor): Tensor;
  labels(): Tensor;
}

Performance Tips

For large datasets, start with linear models (LinearRegression, LogisticRegression) before trying more complex models.

Use Random Forests or Gradient Boosting when you need high accuracy and can afford longer training times.

Scale your features before using distance-based algorithms (KNN, SVM, clustering).

Decision Trees and Random Forests can overfit on small datasets. Use cross-validation and limit tree depth.

Preprocessing

Data scaling and encoding

Metrics

Model evaluation metrics

Neural Networks

Deep learning models

Learn More

API Reference

Complete API documentation

Examples

End-to-end ML examples

Get Started

Core Concepts

Modules

Machine Learning Module

Overview

Key Features

Scikit-learn API

Complete Pipeline

Ensemble Methods

TypeScript Native

Linear Models

Linear Regression

Logistic Regression

Ridge and Lasso

Tree-Based Models

Decision Trees

Random Forest

Gradient Boosting

Support Vector Machines

K-Nearest Neighbors

Clustering

K-Means

DBSCAN

Dimensionality Reduction

PCA (Principal Component Analysis)

t-SNE

Naive Bayes

Use Cases

Model Selection

Performance Tips

Preprocessing

Metrics

Neural Networks

Learn More

API Reference

Examples

Build docs developers (and LLMs) love

Get Started

Core Concepts

Modules

​Overview

​Key Features

Scikit-learn API

Complete Pipeline

Ensemble Methods

TypeScript Native

​Linear Models

​Linear Regression

​Logistic Regression

​Ridge and Lasso

​Tree-Based Models

​Decision Trees

​Random Forest

​Gradient Boosting

​Support Vector Machines

​K-Nearest Neighbors

​Clustering

​K-Means

​DBSCAN

​Dimensionality Reduction

​PCA (Principal Component Analysis)

​t-SNE

​Naive Bayes

​Use Cases

​Model Selection

​Performance Tips

​Related Modules

Preprocessing

Metrics

Neural Networks

​Learn More

API Reference

Examples

Build docs developers (and LLMs) love

Overview

Key Features

Linear Models

Linear Regression

Logistic Regression

Ridge and Lasso

Tree-Based Models

Decision Trees

Random Forest

Gradient Boosting

Support Vector Machines

K-Nearest Neighbors

Clustering

K-Means

DBSCAN

Dimensionality Reduction

PCA (Principal Component Analysis)

t-SNE

Naive Bayes

Use Cases

Model Selection

Performance Tips

Related Modules

Learn More