DecisionTreeClassifier

Overview

The DecisionTreeClassifier class implements a decision tree algorithm for classification problems. It builds a tree structure by recursively splitting the data based on features that maximize information gain or minimize impurity.

Constructor

criterion

Criterion

default:"Criterion::gini"

The function to measure the quality of a split. Supported criteria are:

Criterion::gini - Gini impurity
Criterion::entropy - Information gain

max_depth

std::size_t

default:"std::numeric_limits<std::size_t>::max()"

The maximum depth of the tree. If not set, nodes are expanded until all leaves are pure or contain fewer than min_samples_split samples.

min_samples_split

std::size_t

default:"2"

The minimum number of samples required to split an internal node.

min_samples_leaf

std::size_t

default:"1"

The minimum number of samples required to be at a leaf node.

min_impurity_decrease

double

default:"0.0"

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

#include "decision_tree.h"
using namespace decision_trees;

// Create a classifier with default parameters
DecisionTreeClassifier clf;

// Create a classifier with custom parameters
DecisionTreeClassifier clf_custom(
    Criterion::entropy,
    10,  // max_depth
    5,   // min_samples_split
    2,   // min_samples_leaf
    0.01 // min_impurity_decrease
);

Methods

fit (with string labels)

Train the decision tree classifier on labeled data.

const std::vector<std::vector<double>>&

Training feature matrix where each inner vector represents a sample.

const std::vector<std::string>&

Target class labels as strings.

std::vector<std::vector<double>> X = {
    {5.1, 3.5, 1.4, 0.2},
    {4.9, 3.0, 1.4, 0.2},
    {6.2, 3.4, 5.4, 2.3},
    {5.9, 3.0, 5.1, 1.8}
};

std::vector<std::string> y = {"setosa", "setosa", "virginica", "virginica"};

DecisionTreeClassifier clf;
clf.fit(X, y);

fit (with numeric labels)

Train the decision tree classifier on data with numeric class labels.

const std::vector<std::vector<double>>&

Training feature matrix where each inner vector represents a sample.

const std::vector<double>&

Target class labels as numeric values.

std::vector<std::vector<double>> X = {
    {5.1, 3.5, 1.4, 0.2},
    {4.9, 3.0, 1.4, 0.2},
    {6.2, 3.4, 5.4, 2.3}
};

std::vector<double> y = {0.0, 0.0, 1.0};

DecisionTreeClassifier clf;
clf.fit(X, y);

predict (single sample)

Predict the class label for a single sample, returning a numeric code.

const std::vector<double>&

A single feature vector to predict.

return

double

The predicted class as a numeric code.

std::vector<double> sample = {5.0, 3.3, 1.4, 0.2};
double prediction = clf.predict(sample);

predict (multiple samples)

Predict class labels for multiple samples, returning numeric codes.

const std::vector<std::vector<double>>&

Feature matrix where each inner vector represents a sample.

return

std::vector<double>

Vector of predicted class codes.

std::vector<std::vector<double>> X_test = {
    {5.0, 3.3, 1.4, 0.2},
    {6.1, 2.8, 4.7, 1.2}
};

std::vector<double> predictions = clf.predict(X_test);

predict_class (single sample)

Predict the class label for a single sample, returning the class name as a string.

const std::vector<double>&

A single feature vector to predict.

return

std::string

The predicted class label as a string.

std::vector<double> sample = {5.0, 3.3, 1.4, 0.2};
std::string class_name = clf.predict_class(sample);
// class_name might be "setosa"

predict_class (multiple samples)

Predict class labels for multiple samples, returning class names as strings.

const std::vector<std::vector<double>>&

Feature matrix where each inner vector represents a sample.

return

std::vector<std::string>

Vector of predicted class labels as strings.

std::vector<std::vector<double>> X_test = {
    {5.0, 3.3, 1.4, 0.2},
    {6.1, 2.8, 4.7, 1.2}
};

std::vector<std::string> class_names = clf.predict_class(X_test);

predict_proba

Predict class probabilities for a single sample.

const std::vector<double>&

A single feature vector to predict.

return

std::vector<double>

Vector of class probabilities, one for each class in the order returned by classes().

std::vector<double> sample = {5.0, 3.3, 1.4, 0.2};
std::vector<double> probabilities = clf.predict_proba(sample);

// Get class names to interpret probabilities
auto class_labels = clf.classes();
for (size_t i = 0; i < probabilities.size(); ++i) {
    std::cout << class_labels[i] << ": " << probabilities[i] << std::endl;
}

root

Get a pointer to the root node of the decision tree.

return

const TreeNode*

Pointer to the root TreeNode, or nullptr if the tree hasn’t been fitted.

const TreeNode* tree_root = clf.root();
if (tree_root != nullptr) {
    // Access tree structure
}

classes

Get the class labels known to the classifier.

return

const std::vector<std::string>&

Vector of class label strings in the order they map to numeric codes.

auto class_labels = clf.classes();
for (const auto& label : class_labels) {
    std::cout << label << std::endl;
}

Enumerations

Criterion

Criteria for measuring split quality in classification trees.

enum class Criterion {
    gini,           // Gini impurity (default for classification)
    entropy,        // Information gain / entropy
    mse,            // Mean squared error (for regression)
    friedman_mse,   // Friedman's improvement on MSE (for regression)
    mae             // Mean absolute error (for regression)
};

For classification, use Criterion::gini or Criterion::entropy.

Data structures

TreeNode

Represents a node in the decision tree.

is_leaf

bool

Whether this node is a leaf node.

feature_index

std::size_t

Index of the feature used for splitting at this node (for internal nodes).

threshold

double

Threshold value for the split (for internal nodes).

value

double

The prediction value stored at this node.

class_label

std::string

The predicted class label (for classification leaf nodes).

class_counts

std::vector<std::size_t>

Count of samples per class at this node (used for probability estimates).

left

std::unique_ptr<TreeNode>

Pointer to the left child node.

right

std::unique_ptr<TreeNode>

Pointer to the right child node.

const TreeNode* node = clf.root();
if (node && !node->is_leaf) {
    std::cout << "Split on feature " << node->feature_index 
              << " at threshold " << node->threshold << std::endl;
}

Example usage

#include "decision_tree.h"
#include <iostream>
#include <vector>

using namespace decision_trees;

int main() {
    // Prepare training data (Iris dataset subset)
    std::vector<std::vector<double>> X_train = {
        {5.1, 3.5, 1.4, 0.2},
        {4.9, 3.0, 1.4, 0.2},
        {7.0, 3.2, 4.7, 1.4},
        {6.4, 3.2, 4.5, 1.5},
        {6.3, 3.3, 6.0, 2.5},
        {5.8, 2.7, 5.1, 1.9}
    };
    
    std::vector<std::string> y_train = {
        "setosa", "setosa",
        "versicolor", "versicolor",
        "virginica", "virginica"
    };
    
    // Create and train classifier
    DecisionTreeClassifier clf(Criterion::gini, 5, 2, 1, 0.0);
    clf.fit(X_train, y_train);
    
    // Make predictions
    std::vector<double> sample = {6.0, 3.0, 4.8, 1.8};
    std::string predicted_class = clf.predict_class(sample);
    std::cout << "Predicted class: " << predicted_class << std::endl;
    
    // Get probability estimates
    auto probabilities = clf.predict_proba(sample);
    auto classes = clf.classes();
    
    std::cout << "Class probabilities:\n";
    for (size_t i = 0; i < classes.size(); ++i) {
        std::cout << "  " << classes[i] << ": " 
                  << probabilities[i] << std::endl;
    }
    
    return 0;
}

Regression

Classification

Decision Trees

Clustering

Dimensionality Reduction

Kernels

Model Validation

Loss Functions

DecisionTreeClassifier

Overview

Constructor

Methods

fit (with string labels)

fit (with numeric labels)

predict (single sample)

predict (multiple samples)

predict_class (single sample)

predict_class (multiple samples)

predict_proba

root

classes

Enumerations

Criterion

Data structures

TreeNode

Example usage

Build docs developers (and LLMs) love

Regression

Classification

Decision Trees

Clustering

Dimensionality Reduction

Kernels

Model Validation

Loss Functions

​Overview

​Constructor

​Methods

​fit (with string labels)

​fit (with numeric labels)

​predict (single sample)

​predict (multiple samples)

​predict_class (single sample)

​predict_class (multiple samples)

​predict_proba

​root

​classes

​Enumerations

​Criterion

​Data structures

​TreeNode

​Example usage

Build docs developers (and LLMs) love

Overview

Constructor

Methods

fit (with string labels)

fit (with numeric labels)

predict (single sample)

predict (multiple samples)

predict_class (single sample)

predict_class (multiple samples)

predict_proba

root

classes

Enumerations

Criterion

Data structures

TreeNode

Example usage