The Machine Learning (ML) module provides scikit-learn style implementations of classical machine learning algorithms. It includes supervised learning (classification and regression), unsupervised learning (clustering and dimensionality reduction), and model selection utilities.
Overview
The ML module offers a comprehensive suite of machine learning algorithms:
Linear Models : Linear/Logistic Regression, Ridge, Lasso
Tree-Based : Decision Trees, Random Forests, Gradient Boosting
Support Vector Machines : LinearSVC, LinearSVR
Neighbors : K-Nearest Neighbors for classification and regression
Clustering : K-Means, DBSCAN
Dimensionality Reduction : PCA, t-SNE
Naive Bayes : Gaussian Naive Bayes classifier
Key Features
Scikit-learn API Familiar fit/predict interface compatible with scikit-learn.
Complete Pipeline From data preprocessing to model evaluation.
Ensemble Methods Random Forests and Gradient Boosting for better accuracy.
TypeScript Native Full type safety and modern JavaScript features.
Linear Models
Linear Regression
import { LinearRegression } from 'deepbox/ml' ;
import { tensor } from 'deepbox/ndarray' ;
// Training data
const X = tensor ([[ 1 , 1 ], [ 1 , 2 ], [ 2 , 2 ], [ 2 , 3 ]]);
const y = tensor ([ 1 , 2 , 2 , 3 ]);
// Create and fit model
const model = new LinearRegression ({ fitIntercept: true });
model . fit ( X , y );
// Make predictions
const X_test = tensor ([[ 3 , 5 ]]);
const predictions = model . predict ( X_test );
// Evaluate
const score = model . score ( X , y ); // R² score
Logistic Regression
import { LogisticRegression } from 'deepbox/ml' ;
const X = tensor ([[ 1 , 2 ], [ 2 , 3 ], [ 3 , 1 ], [ 4 , 2 ]]);
const y = tensor ([ 0 , 0 , 1 , 1 ]);
const model = new LogisticRegression ({
penalty: 'l2' ,
C: 1.0 ,
maxIter: 100
});
model . fit ( X , y );
const predictions = model . predict ( X );
const probabilities = model . predictProba ( X );
Ridge and Lasso
import { Ridge , Lasso } from 'deepbox/ml' ;
// Ridge regression (L2 regularization)
const ridge = new Ridge ({ alpha: 1.0 });
ridge . fit ( X_train , y_train );
// Lasso regression (L1 regularization)
const lasso = new Lasso ({ alpha: 0.1 , maxIter: 1000 });
lasso . fit ( X_train , y_train );
Tree-Based Models
Decision Trees
import { DecisionTreeClassifier , DecisionTreeRegressor } from 'deepbox/ml' ;
// Classification
const clf = new DecisionTreeClassifier ({
maxDepth: 5 ,
minSamplesSplit: 2 ,
minSamplesLeaf: 1
});
clf . fit ( X_train , y_train );
const y_pred = clf . predict ( X_test );
// Regression
const reg = new DecisionTreeRegressor ({ maxDepth: 10 });
reg . fit ( X_train , y_train );
Random Forest
import { RandomForestClassifier , RandomForestRegressor } from 'deepbox/ml' ;
// Random Forest Classifier
const rf = new RandomForestClassifier ({
nEstimators: 100 ,
maxDepth: 10 ,
minSamplesSplit: 2 ,
randomState: 42
});
rf . fit ( X_train , y_train );
const predictions = rf . predict ( X_test );
const accuracy = rf . score ( X_test , y_test );
// Feature importance
const importance = rf . featureImportances ();
Gradient Boosting
import { GradientBoostingClassifier , GradientBoostingRegressor } from 'deepbox/ml' ;
// Gradient Boosting for classification
const gbc = new GradientBoostingClassifier ({
nEstimators: 100 ,
learningRate: 0.1 ,
maxDepth: 3 ,
subsample: 0.8
});
gbc . fit ( X_train , y_train );
const y_pred = gbc . predict ( X_test );
// For regression
const gbr = new GradientBoostingRegressor ({
nEstimators: 100 ,
learningRate: 0.1
});
Support Vector Machines
import { LinearSVC , LinearSVR } from 'deepbox/ml' ;
// Linear Support Vector Classifier
const svc = new LinearSVC ({
C: 1.0 ,
maxIter: 1000 ,
tol: 1e-4
});
svc . fit ( X_train , y_train );
const predictions = svc . predict ( X_test );
// Linear Support Vector Regressor
const svr = new LinearSVR ({ C: 1.0 , epsilon: 0.1 });
svr . fit ( X_train , y_train );
K-Nearest Neighbors
import { KNeighborsClassifier , KNeighborsRegressor } from 'deepbox/ml' ;
// KNN Classifier
const knn_clf = new KNeighborsClassifier ({
nNeighbors: 5 ,
weights: 'distance' ,
metric: 'euclidean'
});
knn_clf . fit ( X_train , y_train );
const y_pred = knn_clf . predict ( X_test );
// KNN Regressor
const knn_reg = new KNeighborsRegressor ({ nNeighbors: 3 });
knn_reg . fit ( X_train , y_train );
Clustering
K-Means
import { KMeans } from 'deepbox/ml' ;
import { tensor } from 'deepbox/ndarray' ;
const X = tensor ([
[ 1 , 2 ], [ 1.5 , 1.8 ], [ 5 , 8 ], [ 8 , 8 ], [ 1 , 0.6 ], [ 9 , 11 ]
]);
const kmeans = new KMeans ({
nClusters: 3 ,
maxIter: 300 ,
randomState: 42
});
kmeans . fit ( X );
// Get cluster assignments
const labels = kmeans . labels ();
// Get cluster centers
const centers = kmeans . clusterCenters ();
// Predict cluster for new data
const newPoint = tensor ([[ 0 , 0 ]]);
const cluster = kmeans . predict ( newPoint );
DBSCAN
import { DBSCAN } from 'deepbox/ml' ;
const dbscan = new DBSCAN ({
eps: 0.5 ,
minSamples: 5 ,
metric: 'euclidean'
});
dbscan . fit ( X );
const labels = dbscan . labels ();
// -1 indicates noise points
const corePoints = dbscan . corePointIndices ();
Dimensionality Reduction
PCA (Principal Component Analysis)
import { PCA } from 'deepbox/ml' ;
import { tensor } from 'deepbox/ndarray' ;
const X = tensor ([
[ 2.5 , 2.4 ],
[ 0.5 , 0.7 ],
[ 2.2 , 2.9 ],
[ 1.9 , 2.2 ]
]);
const pca = new PCA ({ nComponents: 1 });
pca . fit ( X );
// Transform data to lower dimensions
const X_reduced = pca . transform ( X );
// Get explained variance ratio
const variance = pca . explainedVarianceRatio ();
// Get principal components
const components = pca . components ();
t-SNE
import { TSNE } from 'deepbox/ml' ;
const tsne = new TSNE ({
nComponents: 2 ,
perplexity: 30 ,
learningRate: 200 ,
nIter: 1000
});
const X_embedded = tsne . fitTransform ( X );
Naive Bayes
import { GaussianNB } from 'deepbox/ml' ;
const gnb = new GaussianNB ();
gnb . fit ( X_train , y_train );
const predictions = gnb . predict ( X_test );
const probabilities = gnb . predictProba ( X_test );
Use Cases
Classify data into two categories: import { LogisticRegression } from 'deepbox/ml' ;
import { tensor } from 'deepbox/ndarray' ;
import { accuracy } from 'deepbox/metrics' ;
// Spam detection example
const X_train = tensor ([ ... ]); // Features
const y_train = tensor ([ 0 , 1 , 0 , 1 , ... ]); // 0=ham, 1=spam
const model = new LogisticRegression ();
model . fit ( X_train , y_train );
const y_pred = model . predict ( X_test );
const acc = accuracy ( y_test , y_pred );
Multi-class Classification
Classify into multiple categories: import { RandomForestClassifier } from 'deepbox/ml' ;
// Iris species classification
const model = new RandomForestClassifier ({ nEstimators: 100 });
model . fit ( X_train , y_train ); // y has classes 0, 1, 2
const predictions = model . predict ( X_test );
const probabilities = model . predictProba ( X_test );
Group customers by behavior: import { KMeans } from 'deepbox/ml' ;
import { tensor } from 'deepbox/ndarray' ;
// Customer features: [age, income, spending_score]
const customers = tensor ([ ... ]);
const kmeans = new KMeans ({ nClusters: 4 });
kmeans . fit ( customers );
const segments = kmeans . labels ();
const centers = kmeans . clusterCenters ();
Reduce dimensionality while preserving information: import { PCA } from 'deepbox/ml' ;
// High-dimensional data
const X = tensor ([ ... ]); // Shape: [n_samples, 100]
const pca = new PCA ({ nComponents: 10 });
pca . fit ( X );
const X_reduced = pca . transform ( X ); // Shape: [n_samples, 10]
console . log ( pca . explainedVarianceRatio (). sum ());
Model Selection
All estimators follow the same interface:
interface Estimator {
fit ( X : Tensor , y ?: Tensor ) : this ;
}
interface Classifier extends Estimator {
predict ( X : Tensor ) : Tensor ;
predictProba ( X : Tensor ) : Tensor ;
score ( X : Tensor , y : Tensor ) : number ;
}
interface Regressor extends Estimator {
predict ( X : Tensor ) : Tensor ;
score ( X : Tensor , y : Tensor ) : number ; // R² score
}
interface Clusterer extends Estimator {
predict ( X : Tensor ) : Tensor ;
labels () : Tensor ;
}
For large datasets, start with linear models (LinearRegression, LogisticRegression) before trying more complex models.
Use Random Forests or Gradient Boosting when you need high accuracy and can afford longer training times.
Scale your features before using distance-based algorithms (KNN, SVM, clustering).
Decision Trees and Random Forests can overfit on small datasets. Use cross-validation and limit tree depth.
Preprocessing Data scaling and encoding
Metrics Model evaluation metrics
Neural Networks Deep learning models
Learn More
API Reference Complete API documentation
Examples End-to-end ML examples