Skip to main content

Overview

The DecisionTreeClassifier class implements a decision tree algorithm for classification problems. It builds a tree structure by recursively splitting the data based on features that maximize information gain or minimize impurity.

Constructor

criterion
Criterion
default:"Criterion::gini"
The function to measure the quality of a split. Supported criteria are:
  • Criterion::gini - Gini impurity
  • Criterion::entropy - Information gain
max_depth
std::size_t
default:"std::numeric_limits<std::size_t>::max()"
The maximum depth of the tree. If not set, nodes are expanded until all leaves are pure or contain fewer than min_samples_split samples.
min_samples_split
std::size_t
default:"2"
The minimum number of samples required to split an internal node.
min_samples_leaf
std::size_t
default:"1"
The minimum number of samples required to be at a leaf node.
min_impurity_decrease
double
default:"0.0"
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
#include "decision_tree.h"
using namespace decision_trees;

// Create a classifier with default parameters
DecisionTreeClassifier clf;

// Create a classifier with custom parameters
DecisionTreeClassifier clf_custom(
    Criterion::entropy,
    10,  // max_depth
    5,   // min_samples_split
    2,   // min_samples_leaf
    0.01 // min_impurity_decrease
);

Methods

fit (with string labels)

Train the decision tree classifier on labeled data.
X
const std::vector<std::vector<double>>&
Training feature matrix where each inner vector represents a sample.
y
const std::vector<std::string>&
Target class labels as strings.
std::vector<std::vector<double>> X = {
    {5.1, 3.5, 1.4, 0.2},
    {4.9, 3.0, 1.4, 0.2},
    {6.2, 3.4, 5.4, 2.3},
    {5.9, 3.0, 5.1, 1.8}
};

std::vector<std::string> y = {"setosa", "setosa", "virginica", "virginica"};

DecisionTreeClassifier clf;
clf.fit(X, y);

fit (with numeric labels)

Train the decision tree classifier on data with numeric class labels.
X
const std::vector<std::vector<double>>&
Training feature matrix where each inner vector represents a sample.
y
const std::vector<double>&
Target class labels as numeric values.
std::vector<std::vector<double>> X = {
    {5.1, 3.5, 1.4, 0.2},
    {4.9, 3.0, 1.4, 0.2},
    {6.2, 3.4, 5.4, 2.3}
};

std::vector<double> y = {0.0, 0.0, 1.0};

DecisionTreeClassifier clf;
clf.fit(X, y);

predict (single sample)

Predict the class label for a single sample, returning a numeric code.
x
const std::vector<double>&
A single feature vector to predict.
return
double
The predicted class as a numeric code.
std::vector<double> sample = {5.0, 3.3, 1.4, 0.2};
double prediction = clf.predict(sample);

predict (multiple samples)

Predict class labels for multiple samples, returning numeric codes.
X
const std::vector<std::vector<double>>&
Feature matrix where each inner vector represents a sample.
return
std::vector<double>
Vector of predicted class codes.
std::vector<std::vector<double>> X_test = {
    {5.0, 3.3, 1.4, 0.2},
    {6.1, 2.8, 4.7, 1.2}
};

std::vector<double> predictions = clf.predict(X_test);

predict_class (single sample)

Predict the class label for a single sample, returning the class name as a string.
x
const std::vector<double>&
A single feature vector to predict.
return
std::string
The predicted class label as a string.
std::vector<double> sample = {5.0, 3.3, 1.4, 0.2};
std::string class_name = clf.predict_class(sample);
// class_name might be "setosa"

predict_class (multiple samples)

Predict class labels for multiple samples, returning class names as strings.
X
const std::vector<std::vector<double>>&
Feature matrix where each inner vector represents a sample.
return
std::vector<std::string>
Vector of predicted class labels as strings.
std::vector<std::vector<double>> X_test = {
    {5.0, 3.3, 1.4, 0.2},
    {6.1, 2.8, 4.7, 1.2}
};

std::vector<std::string> class_names = clf.predict_class(X_test);

predict_proba

Predict class probabilities for a single sample.
x
const std::vector<double>&
A single feature vector to predict.
return
std::vector<double>
Vector of class probabilities, one for each class in the order returned by classes().
std::vector<double> sample = {5.0, 3.3, 1.4, 0.2};
std::vector<double> probabilities = clf.predict_proba(sample);

// Get class names to interpret probabilities
auto class_labels = clf.classes();
for (size_t i = 0; i < probabilities.size(); ++i) {
    std::cout << class_labels[i] << ": " << probabilities[i] << std::endl;
}

root

Get a pointer to the root node of the decision tree.
return
const TreeNode*
Pointer to the root TreeNode, or nullptr if the tree hasn’t been fitted.
const TreeNode* tree_root = clf.root();
if (tree_root != nullptr) {
    // Access tree structure
}

classes

Get the class labels known to the classifier.
return
const std::vector<std::string>&
Vector of class label strings in the order they map to numeric codes.
auto class_labels = clf.classes();
for (const auto& label : class_labels) {
    std::cout << label << std::endl;
}

Enumerations

Criterion

Criteria for measuring split quality in classification trees.
enum class Criterion {
    gini,           // Gini impurity (default for classification)
    entropy,        // Information gain / entropy
    mse,            // Mean squared error (for regression)
    friedman_mse,   // Friedman's improvement on MSE (for regression)
    mae             // Mean absolute error (for regression)
};
For classification, use Criterion::gini or Criterion::entropy.

Data structures

TreeNode

Represents a node in the decision tree.
is_leaf
bool
Whether this node is a leaf node.
feature_index
std::size_t
Index of the feature used for splitting at this node (for internal nodes).
threshold
double
Threshold value for the split (for internal nodes).
value
double
The prediction value stored at this node.
class_label
std::string
The predicted class label (for classification leaf nodes).
class_counts
std::vector<std::size_t>
Count of samples per class at this node (used for probability estimates).
left
std::unique_ptr<TreeNode>
Pointer to the left child node.
right
std::unique_ptr<TreeNode>
Pointer to the right child node.
const TreeNode* node = clf.root();
if (node && !node->is_leaf) {
    std::cout << "Split on feature " << node->feature_index 
              << " at threshold " << node->threshold << std::endl;
}

Example usage

#include "decision_tree.h"
#include <iostream>
#include <vector>

using namespace decision_trees;

int main() {
    // Prepare training data (Iris dataset subset)
    std::vector<std::vector<double>> X_train = {
        {5.1, 3.5, 1.4, 0.2},
        {4.9, 3.0, 1.4, 0.2},
        {7.0, 3.2, 4.7, 1.4},
        {6.4, 3.2, 4.5, 1.5},
        {6.3, 3.3, 6.0, 2.5},
        {5.8, 2.7, 5.1, 1.9}
    };
    
    std::vector<std::string> y_train = {
        "setosa", "setosa",
        "versicolor", "versicolor",
        "virginica", "virginica"
    };
    
    // Create and train classifier
    DecisionTreeClassifier clf(Criterion::gini, 5, 2, 1, 0.0);
    clf.fit(X_train, y_train);
    
    // Make predictions
    std::vector<double> sample = {6.0, 3.0, 4.8, 1.8};
    std::string predicted_class = clf.predict_class(sample);
    std::cout << "Predicted class: " << predicted_class << std::endl;
    
    // Get probability estimates
    auto probabilities = clf.predict_proba(sample);
    auto classes = clf.classes();
    
    std::cout << "Class probabilities:\n";
    for (size_t i = 0; i < classes.size(); ++i) {
        std::cout << "  " << classes[i] << ": " 
                  << probabilities[i] << std::endl;
    }
    
    return 0;
}

Build docs developers (and LLMs) love