Skip to main content
Random forests combine multiple decision trees to reduce overfitting and improve accuracy. They’re one of the most effective machine learning algorithms.

Using Random Forests

1
Random Forest Classification
2
Train an ensemble of decision trees for classification:
3
import { loadIris } from "deepbox/datasets";
import { accuracy } from "deepbox/metrics";
import { RandomForestClassifier } from "deepbox/ml";
import { trainTestSplit } from "deepbox/preprocess";

const iris = loadIris();
const [XTrain, XTest, yTrain, yTest] = trainTestSplit(
  iris.data, 
  iris.target, 
  {
    testSize: 0.2,
    randomState: 42,
  }
);

// Create and train random forest
const rfc = new RandomForestClassifier({
  nEstimators: 50,    // Number of trees
  maxDepth: 5,        // Max depth per tree
  randomState: 42,    // For reproducibility
});
rfc.fit(XTrain, yTrain);

const rfcPred = rfc.predict(XTest);
const acc = accuracy(yTest, rfcPred);

console.log("Random Forest Classifier");
console.log(`Accuracy: ${(Number(acc) * 100).toFixed(2)}%`);
4
Output:
5
Random Forest Classifier
Accuracy: 100.00%
6
Random Forest Regression
7
Use random forests for continuous predictions:
8
import { mse, r2Score } from "deepbox/metrics";
import { RandomForestRegressor } from "deepbox/ml";
import { tensor } from "deepbox/ndarray";

const XReg = tensor([
  [1, 2], [2, 3], [3, 4], [4, 5], [5, 6],
  [6, 7], [7, 8], [8, 9], [9, 10], [10, 11],
  [1, 3], [2, 5], [3, 2], [4, 1], [5, 4],
  [6, 3], [7, 6], [8, 5], [9, 8], [10, 7],
]);
const yReg = tensor([5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 7, 12, 7, 6, 13, 12, 19, 18, 25, 24]);

const [XRegTrain, XRegTest, yRegTrain, yRegTest] = trainTestSplit(
  XReg, 
  yReg, 
  {
    testSize: 0.2,
    randomState: 42,
  }
);

// Train random forest regressor
const rfr = new RandomForestRegressor({
  nEstimators: 50,
  maxDepth: 5,
  randomState: 42,
});
rfr.fit(XRegTrain, yRegTrain);

const rfrPred = rfr.predict(XRegTest);
console.log("\nRandom Forest Regressor");
console.log(`MSE: ${mse(yRegTest, rfrPred).toFixed(4)}`);
console.log(`R²:  ${r2Score(yRegTest, rfrPred).toFixed(4)}`);
9
Output:
10
Random Forest Regressor
MSE: 0.2500
R²:  0.9925
11
Tuning hyperparameters
12
Adjust key parameters for optimal performance:
13
const rf = new RandomForestClassifier({
  nEstimators: 100,    // More trees = better performance (up to a point)
  maxDepth: 10,        // Deeper trees = more expressive
  minSamplesSplit: 5,  // Higher = less overfitting
  randomState: 42,     // For reproducible results
});
14
Key parameters:
15
  • nEstimators: Number of trees in the forest
  • maxDepth: Maximum depth of each tree
  • minSamplesSplit: Minimum samples required to split a node
  • randomState: Seed for reproducibility
  • Why Random Forests Work

    1. Bagging: Each tree trains on a random subset of data
    2. Feature randomness: Each split considers a random subset of features
    3. Voting: Final prediction aggregates all trees (majority vote or average)
    4. Reduced overfitting: Individual tree errors cancel out

    Next Steps

    Gradient Boosting

    Achieve even higher accuracy with boosting

    Clustering

    Discover patterns with unsupervised learning

    Build docs developers (and LLMs) love