Skip to main content
Decision trees recursively partition the feature space to make predictions. MLPP provides CART-style decision trees for both classification and regression tasks.

Overview

Decision trees learn a hierarchy of if-then-else rules by selecting the best feature and threshold at each node to maximize information gain (classification) or minimize variance (regression).

Key concepts

  • Internal nodes: Test a feature against a threshold
  • Leaf nodes: Make predictions (class label or regression value)
  • Splitting criterion: Metric to evaluate split quality (Gini, entropy, MSE)
  • Support vectors: Training samples at each node used for splitting

Decision tree classifier

Classifies samples by learning decision rules from features.

Basic usage

#include <decision_trees/decision_tree.h>

using namespace decision_trees;

// Create classifier with Gini criterion
DecisionTreeClassifier classifier(
    Criterion::gini,          // splitting criterion
    10,                       // max_depth
    2,                        // min_samples_split
    1,                        // min_samples_leaf
    0.0                       // min_impurity_decrease
);

// Prepare training data
std::vector<std::vector<double>> X_train = {
    {5.1, 3.5, 1.4, 0.2},
    {4.9, 3.0, 1.4, 0.2},
    // ...
};
std::vector<std::string> y_train = {"setosa", "setosa", /* ... */};

// Fit the tree
classifier.fit(X_train, y_train);

// Predict class for new sample
std::vector<double> x_new = {5.0, 3.2, 1.2, 0.2};
std::string prediction = classifier.predict_class(x_new);

// Predict probabilities
auto probs = classifier.predict_proba(x_new);

// Batch predictions
auto predictions = classifier.predict_class(X_test);

Constructor parameters

criterion
Criterion
default:"gini"
Splitting criterion for classification:
  • Criterion::gini - Gini impurity (default, faster)
  • Criterion::entropy - Information gain (slower, more precise)
max_depth
std::size_t
default:"∞"
Maximum depth of the tree. Use smaller values to prevent overfitting.
min_samples_split
std::size_t
default:"2"
Minimum number of samples required to split an internal node.
min_samples_leaf
std::size_t
default:"1"
Minimum number of samples required at each leaf node.
min_impurity_decrease
double
default:"0.0"
Split only if impurity decrease is at least this value.

Methods

fit
void
void fit(const std::vector<std::vector<double>>& X,
         const std::vector<std::string>& y)
Fit the classifier to training data.
  • X: Feature matrix (n_samples × n_features)
  • y: Class labels as strings
predict_class
std::string
std::string predict_class(const std::vector<double>& x) const
Predict class label for a single sample.
predict_class (batch)
std::vector<std::string>
std::vector<std::string> predict_class(
    const std::vector<std::vector<double>>& X) const
Predict class labels for multiple samples.
predict_proba
std::vector<double>
std::vector<double> predict_proba(const std::vector<double>& x) const
Return class probabilities (one value per class in sorted order).
predict
double
double predict(const std::vector<double>& x) const
Predict encoded class label (internal numeric representation).
root
const TreeNode*
const TreeNode* root() const noexcept
Access the root node for tree inspection.
classes
const std::vector<std::string>&
const std::vector<std::string>& classes() const noexcept
Get list of class labels learned during training.

Splitting criteria

Measures how often a randomly chosen element would be incorrectly classified:
Gini(S) = 1 - Σ pᵢ²
where pᵢ is the proportion of class i in set S.
DecisionTreeClassifier classifier(Criterion::gini);
Faster to compute and works well in practice.

Decision tree regressor

Predicts continuous values by learning decision rules.

Basic usage

#include <decision_trees/decision_tree.h>

using namespace decision_trees;

// Create regressor with MSE criterion
DecisionTreeRegressor regressor(
    Criterion::mse,           // splitting criterion
    10,                       // max_depth
    2,                        // min_samples_split
    1,                        // min_samples_leaf
    0.0                       // min_impurity_decrease
);

// Prepare training data
std::vector<std::vector<double>> X_train = {
    {0.5, 1.2},
    {1.0, 1.8},
    // ...
};
std::vector<double> y_train = {2.3, 4.5, /* ... */};

// Fit the tree
regressor.fit(X_train, y_train);

// Predict value for new sample
std::vector<double> x_new = {0.8, 1.5};
double prediction = regressor.predict(x_new);

// Batch predictions
auto predictions = regressor.predict(X_test);

Constructor parameters

criterion
Criterion
default:"mse"
Splitting criterion for regression:
  • Criterion::mse - Mean squared error (default)
  • Criterion::friedman_mse - Friedman’s improvement on MSE
  • Criterion::mae - Mean absolute error (more robust to outliers)
max_depth
std::size_t
default:"∞"
Maximum depth of the tree. Control tree complexity.
min_samples_split
std::size_t
default:"2"
Minimum samples required to split a node.
min_samples_leaf
std::size_t
default:"1"
Minimum samples required at leaf nodes.
min_impurity_decrease
double
default:"0.0"
Minimum variance reduction required to split.

Methods

fit
void
void fit(const std::vector<std::vector<double>>& X,
         const std::vector<double>& y)
Fit the regressor to training data.
  • X: Feature matrix (n_samples × n_features)
  • y: Continuous target values
predict
double
double predict(const std::vector<double>& x) const
Predict target value for a single sample.
predict (batch)
std::vector<double>
std::vector<double> predict(
    const std::vector<std::vector<double>>& X) const
Predict target values for multiple samples.
root
const TreeNode*
const TreeNode* root() const noexcept
Access the root node for tree inspection.

Splitting criteria for regression

Mean squared error:
MSE(S) = (1/n) Σ (yᵢ - ȳ)²
DecisionTreeRegressor regressor(Criterion::mse);
Standard choice for regression trees.

Tree structure

Inspect the learned tree structure using the TreeNode class:
struct TreeNode {
    bool is_leaf;                     // True for leaf nodes
    std::size_t feature_index;        // Feature to split on (internal nodes)
    double threshold;                 // Split threshold (internal nodes)
    double value;                     // Predicted value (leaf nodes - regression)
    std::string class_label;          // Predicted class (leaf nodes - classification)
    std::vector<std::size_t> class_counts;  // Class distribution (classification)
    std::unique_ptr<TreeNode> left;   // Left child (feature <= threshold)
    std::unique_ptr<TreeNode> right;  // Right child (feature > threshold)
};

Traversing the tree

void print_tree(const TreeNode* node, int depth = 0) {
    if (!node) return;
    
    std::string indent(depth * 2, ' ');
    
    if (node->is_leaf) {
        std::cout << indent << "Leaf: " << node->class_label 
                  << " (value: " << node->value << ")" << std::endl;
    } else {
        std::cout << indent << "Split: feature " << node->feature_index
                  << " <= " << node->threshold << std::endl;
        print_tree(node->left.get(), depth + 1);
        print_tree(node->right.get(), depth + 1);
    }
}

// Usage
auto root = classifier.root();
print_tree(root);

Preventing overfitting

Decision trees easily overfit without constraints. Use these techniques:

Pre-pruning (preferred)

Set constraints during tree construction:
DecisionTreeClassifier classifier(
    Criterion::gini,
    5,        // max_depth: Limit tree depth
    10,       // min_samples_split: Require more samples to split
    5,        // min_samples_leaf: Require more samples per leaf
    0.01      // min_impurity_decrease: Only split if gain > 0.01
);
Start with max_depth = 5-10 and min_samples_leaf = 5-10. Tune these hyperparameters via cross-validation.

Hyperparameter guidelines

ParameterEffectRecommended range
max_depthLimits tree depth3-20
min_samples_splitRequires more samples to split2-50
min_samples_leafRequires more samples per leaf1-20
min_impurity_decreaseRequires minimum gain0.0-0.1

Example: Complete workflow

#include <decision_trees/decision_tree.h>
#include <iostream>
#include <vector>

using namespace decision_trees;

int main() {
    // Prepare classification data
    std::vector<std::vector<double>> X_train = {
        {5.1, 3.5, 1.4, 0.2},
        {4.9, 3.0, 1.4, 0.2},
        {7.0, 3.2, 4.7, 1.4},
        {6.4, 3.2, 4.5, 1.5},
        {6.3, 3.3, 6.0, 2.5},
        {5.8, 2.7, 5.1, 1.9}
    };
    std::vector<std::string> y_train = {
        "setosa", "setosa", "versicolor", 
        "versicolor", "virginica", "virginica"
    };
    
    // Create and train classifier
    DecisionTreeClassifier classifier(
        Criterion::gini,
        3,    // max_depth
        2,    // min_samples_split
        1     // min_samples_leaf
    );
    
    classifier.fit(X_train, y_train);
    
    // Make predictions
    std::vector<double> x_new = {6.0, 3.0, 4.8, 1.8};
    std::string prediction = classifier.predict_class(x_new);
    auto probs = classifier.predict_proba(x_new);
    
    std::cout << "Predicted class: " << prediction << std::endl;
    std::cout << "Class probabilities: ";
    for (double p : probs) {
        std::cout << p << " ";
    }
    std::cout << std::endl;
    
    // Inspect tree structure
    auto root = classifier.root();
    std::cout << "Tree depth: " << calculate_depth(root) << std::endl;
    std::cout << "Number of leaves: " << count_leaves(root) << std::endl;
    
    // Try regression
    std::vector<std::vector<double>> X_reg = {{1.0}, {2.0}, {3.0}, {4.0}};
    std::vector<double> y_reg = {2.0, 4.0, 6.0, 8.0};
    
    DecisionTreeRegressor regressor(Criterion::mse, 3);
    regressor.fit(X_reg, y_reg);
    
    double pred = regressor.predict({2.5});
    std::cout << "Regression prediction: " << pred << std::endl;
    
    return 0;
}

Advantages and limitations

Advantages

  • Interpretable: Easy to visualize and understand
  • Non-parametric: No assumptions about data distribution
  • Handles non-linear relationships: Automatically learns interactions
  • Mixed data types: Works with numerical and categorical features
  • Feature importance: Can rank feature contributions

Limitations

  • Overfitting: Prone to overfitting without constraints
  • Instability: Small data changes can produce very different trees
  • Bias: Biased toward features with many values
  • Not smooth: Predictions are piecewise constant (regression)
For better performance, consider ensemble methods like Random Forests or Gradient Boosting, which combine multiple decision trees to reduce overfitting and improve stability.

Comparison: Classification vs regression

AspectDecisionTreeClassifierDecisionTreeRegressor
OutputDiscrete class labelsContinuous values
Leaf predictionMajority classMean of target values
Splitting criteriaGini, entropyMSE, Friedman MSE, MAE
Probability estimatesYes (via predict_proba)No
Use caseSpam detection, medical diagnosisHouse price prediction, sales forecasting

Build docs developers (and LLMs) love