Decision trees

Decision trees recursively partition the feature space to make predictions. MLPP provides CART-style decision trees for both classification and regression tasks.

Overview

Decision trees learn a hierarchy of if-then-else rules by selecting the best feature and threshold at each node to maximize information gain (classification) or minimize variance (regression).

Key concepts

Internal nodes: Test a feature against a threshold
Leaf nodes: Make predictions (class label or regression value)
Splitting criterion: Metric to evaluate split quality (Gini, entropy, MSE)
Support vectors: Training samples at each node used for splitting

Decision tree classifier

Classifies samples by learning decision rules from features.

Basic usage

#include <decision_trees/decision_tree.h>

using namespace decision_trees;

// Create classifier with Gini criterion
DecisionTreeClassifier classifier(
    Criterion::gini,          // splitting criterion
    10,                       // max_depth
    2,                        // min_samples_split
    1,                        // min_samples_leaf
    0.0                       // min_impurity_decrease
);

// Prepare training data
std::vector<std::vector<double>> X_train = {
    {5.1, 3.5, 1.4, 0.2},
    {4.9, 3.0, 1.4, 0.2},
    // ...
};
std::vector<std::string> y_train = {"setosa", "setosa", /* ... */};

// Fit the tree
classifier.fit(X_train, y_train);

// Predict class for new sample
std::vector<double> x_new = {5.0, 3.2, 1.2, 0.2};
std::string prediction = classifier.predict_class(x_new);

// Predict probabilities
auto probs = classifier.predict_proba(x_new);

// Batch predictions
auto predictions = classifier.predict_class(X_test);

Constructor parameters

criterion

Criterion

default:"gini"

Splitting criterion for classification:

Criterion::gini - Gini impurity (default, faster)
Criterion::entropy - Information gain (slower, more precise)

max_depth

std::size_t

default:"∞"

Maximum depth of the tree. Use smaller values to prevent overfitting.

min_samples_split

std::size_t

default:"2"

Minimum number of samples required to split an internal node.

min_samples_leaf

std::size_t

default:"1"

Minimum number of samples required at each leaf node.

min_impurity_decrease

double

default:"0.0"

Split only if impurity decrease is at least this value.

Methods

fit

void

void fit(const std::vector<std::vector<double>>& X,
         const std::vector<std::string>& y)

Fit the classifier to training data.

X: Feature matrix (n_samples × n_features)
y: Class labels as strings

predict_class

std::string

std::string predict_class(const std::vector<double>& x) const

Predict class label for a single sample.

predict_class (batch)

std::vector<std::string>

std::vector<std::string> predict_class(
    const std::vector<std::vector<double>>& X) const

Predict class labels for multiple samples.

predict_proba

std::vector<double>

std::vector<double> predict_proba(const std::vector<double>& x) const

Return class probabilities (one value per class in sorted order).

predict

double

double predict(const std::vector<double>& x) const

Predict encoded class label (internal numeric representation).

root

const TreeNode*

const TreeNode* root() const noexcept

Access the root node for tree inspection.

classes

const std::vector<std::string>&

const std::vector<std::string>& classes() const noexcept

Get list of class labels learned during training.

Splitting criteria

Gini impurity
Entropy

Measures how often a randomly chosen element would be incorrectly classified:

Gini(S) = 1 - Σ pᵢ²

where pᵢ is the proportion of class i in set S.

DecisionTreeClassifier classifier(Criterion::gini);

Faster to compute and works well in practice.

Measures information content:

Entropy(S) = -Σ pᵢ log₂(pᵢ)

DecisionTreeClassifier classifier(Criterion::entropy);

More theoretically grounded but computationally slower.

Decision tree regressor

Predicts continuous values by learning decision rules.

Basic usage

#include <decision_trees/decision_tree.h>

using namespace decision_trees;

// Create regressor with MSE criterion
DecisionTreeRegressor regressor(
    Criterion::mse,           // splitting criterion
    10,                       // max_depth
    2,                        // min_samples_split
    1,                        // min_samples_leaf
    0.0                       // min_impurity_decrease
);

// Prepare training data
std::vector<std::vector<double>> X_train = {
    {0.5, 1.2},
    {1.0, 1.8},
    // ...
};
std::vector<double> y_train = {2.3, 4.5, /* ... */};

// Fit the tree
regressor.fit(X_train, y_train);

// Predict value for new sample
std::vector<double> x_new = {0.8, 1.5};
double prediction = regressor.predict(x_new);

// Batch predictions
auto predictions = regressor.predict(X_test);

Constructor parameters

criterion

Criterion

default:"mse"

Splitting criterion for regression:

Criterion::mse - Mean squared error (default)
Criterion::friedman_mse - Friedman’s improvement on MSE
Criterion::mae - Mean absolute error (more robust to outliers)

max_depth

std::size_t

default:"∞"

Maximum depth of the tree. Control tree complexity.

min_samples_split

std::size_t

default:"2"

Minimum samples required to split a node.

min_samples_leaf

std::size_t

default:"1"

Minimum samples required at leaf nodes.

min_impurity_decrease

double

default:"0.0"

Minimum variance reduction required to split.

Methods

fit

void

void fit(const std::vector<std::vector<double>>& X,
         const std::vector<double>& y)

Fit the regressor to training data.

X: Feature matrix (n_samples × n_features)
y: Continuous target values

predict

double

double predict(const std::vector<double>& x) const

Predict target value for a single sample.

predict (batch)

std::vector<double>

std::vector<double> predict(
    const std::vector<std::vector<double>>& X) const

Predict target values for multiple samples.

root

const TreeNode*

const TreeNode* root() const noexcept

Access the root node for tree inspection.

Splitting criteria for regression

MSE
Friedman MSE
MAE

Mean squared error:

MSE(S) = (1/n) Σ (yᵢ - ȳ)²

DecisionTreeRegressor regressor(Criterion::mse);

Standard choice for regression trees.

Friedman’s improvement score:

Score = (n_left · n_right / n²) · (ȳ_left - ȳ_right)²

DecisionTreeRegressor regressor(Criterion::friedman_mse);

Often produces better trees for regression.

Mean absolute error:

MAE(S) = (1/n) Σ |yᵢ - median(y)|

DecisionTreeRegressor regressor(Criterion::mae);

More robust to outliers but slower to compute.

Tree structure

Inspect the learned tree structure using the TreeNode class:

struct TreeNode {
    bool is_leaf;                     // True for leaf nodes
    std::size_t feature_index;        // Feature to split on (internal nodes)
    double threshold;                 // Split threshold (internal nodes)
    double value;                     // Predicted value (leaf nodes - regression)
    std::string class_label;          // Predicted class (leaf nodes - classification)
    std::vector<std::size_t> class_counts;  // Class distribution (classification)
    std::unique_ptr<TreeNode> left;   // Left child (feature <= threshold)
    std::unique_ptr<TreeNode> right;  // Right child (feature > threshold)
};

Traversing the tree

void print_tree(const TreeNode* node, int depth = 0) {
    if (!node) return;
    
    std::string indent(depth * 2, ' ');
    
    if (node->is_leaf) {
        std::cout << indent << "Leaf: " << node->class_label 
                  << " (value: " << node->value << ")" << std::endl;
    } else {
        std::cout << indent << "Split: feature " << node->feature_index
                  << " <= " << node->threshold << std::endl;
        print_tree(node->left.get(), depth + 1);
        print_tree(node->right.get(), depth + 1);
    }
}

// Usage
auto root = classifier.root();
print_tree(root);

Preventing overfitting

Decision trees easily overfit without constraints. Use these techniques:

Pre-pruning (preferred)

Set constraints during tree construction:

DecisionTreeClassifier classifier(
    Criterion::gini,
    5,        // max_depth: Limit tree depth
    10,       // min_samples_split: Require more samples to split
    5,        // min_samples_leaf: Require more samples per leaf
    0.01      // min_impurity_decrease: Only split if gain > 0.01
);

Start with max_depth = 5-10 and min_samples_leaf = 5-10. Tune these hyperparameters via cross-validation.

Hyperparameter guidelines

Parameter	Effect	Recommended range
`max_depth`	Limits tree depth	3-20
`min_samples_split`	Requires more samples to split	2-50
`min_samples_leaf`	Requires more samples per leaf	1-20
`min_impurity_decrease`	Requires minimum gain	0.0-0.1

Example: Complete workflow

#include <decision_trees/decision_tree.h>
#include <iostream>
#include <vector>

using namespace decision_trees;

int main() {
    // Prepare classification data
    std::vector<std::vector<double>> X_train = {
        {5.1, 3.5, 1.4, 0.2},
        {4.9, 3.0, 1.4, 0.2},
        {7.0, 3.2, 4.7, 1.4},
        {6.4, 3.2, 4.5, 1.5},
        {6.3, 3.3, 6.0, 2.5},
        {5.8, 2.7, 5.1, 1.9}
    };
    std::vector<std::string> y_train = {
        "setosa", "setosa", "versicolor", 
        "versicolor", "virginica", "virginica"
    };
    
    // Create and train classifier
    DecisionTreeClassifier classifier(
        Criterion::gini,
        3,    // max_depth
        2,    // min_samples_split
        1     // min_samples_leaf
    );
    
    classifier.fit(X_train, y_train);
    
    // Make predictions
    std::vector<double> x_new = {6.0, 3.0, 4.8, 1.8};
    std::string prediction = classifier.predict_class(x_new);
    auto probs = classifier.predict_proba(x_new);
    
    std::cout << "Predicted class: " << prediction << std::endl;
    std::cout << "Class probabilities: ";
    for (double p : probs) {
        std::cout << p << " ";
    }
    std::cout << std::endl;
    
    // Inspect tree structure
    auto root = classifier.root();
    std::cout << "Tree depth: " << calculate_depth(root) << std::endl;
    std::cout << "Number of leaves: " << count_leaves(root) << std::endl;
    
    // Try regression
    std::vector<std::vector<double>> X_reg = {{1.0}, {2.0}, {3.0}, {4.0}};
    std::vector<double> y_reg = {2.0, 4.0, 6.0, 8.0};
    
    DecisionTreeRegressor regressor(Criterion::mse, 3);
    regressor.fit(X_reg, y_reg);
    
    double pred = regressor.predict({2.5});
    std::cout << "Regression prediction: " << pred << std::endl;
    
    return 0;
}

Advantages and limitations

Advantages

Interpretable: Easy to visualize and understand
Non-parametric: No assumptions about data distribution
Handles non-linear relationships: Automatically learns interactions
Mixed data types: Works with numerical and categorical features
Feature importance: Can rank feature contributions

Limitations

Overfitting: Prone to overfitting without constraints
Instability: Small data changes can produce very different trees
Bias: Biased toward features with many values
Not smooth: Predictions are piecewise constant (regression)

For better performance, consider ensemble methods like Random Forests or Gradient Boosting, which combine multiple decision trees to reduce overfitting and improve stability.

Comparison: Classification vs regression

Aspect	DecisionTreeClassifier	DecisionTreeRegressor
Output	Discrete class labels	Continuous values
Leaf prediction	Majority class	Mean of target values
Splitting criteria	Gini, entropy	MSE, Friedman MSE, MAE
Probability estimates	Yes (via `predict_proba`)	No
Use case	Spam detection, medical diagnosis	House price prediction, sales forecasting

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

Decision trees

Overview

Key concepts

Decision tree classifier

Basic usage

Constructor parameters

Methods

Splitting criteria

Decision tree regressor

Basic usage

Constructor parameters

Methods

Splitting criteria for regression

Tree structure

Traversing the tree

Preventing overfitting

Pre-pruning (preferred)

Hyperparameter guidelines

Example: Complete workflow

Advantages and limitations

Advantages

Limitations

Comparison: Classification vs regression

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Supervised Learning

Unsupervised Learning

Model Validation

Advanced Topics

​Overview

​Key concepts

​Decision tree classifier

​Basic usage

​Constructor parameters

​Methods

​Splitting criteria

​Decision tree regressor

​Basic usage

​Constructor parameters

​Methods

​Splitting criteria for regression

​Tree structure

​Traversing the tree

​Preventing overfitting

​Pre-pruning (preferred)

​Hyperparameter guidelines

​Example: Complete workflow

​Advantages and limitations

​Advantages

​Limitations

​Comparison: Classification vs regression

Build docs developers (and LLMs) love

Overview

Key concepts

Decision tree classifier

Basic usage

Constructor parameters

Methods

Splitting criteria

Decision tree regressor

Basic usage

Constructor parameters

Methods

Splitting criteria for regression

Tree structure

Traversing the tree

Preventing overfitting

Pre-pruning (preferred)

Hyperparameter guidelines

Example: Complete workflow

Advantages and limitations

Advantages

Limitations

Comparison: Classification vs regression