Skip to main content

Overview

The DecisionTreeRegressor class implements a decision tree algorithm for regression problems. It builds a tree structure by recursively splitting the data based on features that minimize variance or error in the target values.

Constructor

criterion
Criterion
default:"Criterion::mse"
The function to measure the quality of a split. Supported criteria are:
  • Criterion::mse - Mean squared error (default for regression)
  • Criterion::friedman_mse - Friedman’s improvement on MSE
  • Criterion::mae - Mean absolute error
max_depth
std::size_t
default:"std::numeric_limits<std::size_t>::max()"
The maximum depth of the tree. If not set, nodes are expanded until all leaves contain fewer than min_samples_split samples.
min_samples_split
std::size_t
default:"2"
The minimum number of samples required to split an internal node.
min_samples_leaf
std::size_t
default:"1"
The minimum number of samples required to be at a leaf node.
min_impurity_decrease
double
default:"0.0"
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
#include "decision_tree.h"
using namespace decision_trees;

// Create a regressor with default parameters
DecisionTreeRegressor reg;

// Create a regressor with custom parameters
DecisionTreeRegressor reg_custom(
    Criterion::mae,
    8,   // max_depth
    10,  // min_samples_split
    5,   // min_samples_leaf
    0.05 // min_impurity_decrease
);

Methods

fit (primary method)

Train the decision tree regressor on continuous target data.
X
const std::vector<std::vector<double>>&
Training feature matrix where each inner vector represents a sample.
y
const std::vector<double>&
Target values (continuous, real-valued).
std::vector<std::vector<double>> X = {
    {0.5, 1.2},
    {1.5, 1.8},
    {2.5, 2.1},
    {3.5, 2.9},
    {4.5, 3.2}
};

std::vector<double> y = {2.3, 3.1, 4.8, 6.2, 7.5};

DecisionTreeRegressor reg;
reg.fit(X, y);
Overridden method for compatibility with the base class. Not typically used for regression.
X
const std::vector<std::vector<double>>&
Training feature matrix where each inner vector represents a sample.
y
const std::vector<std::string>&
Target values as strings (will be converted to numeric internally).
// Not the typical usage pattern for regression
std::vector<std::string> y_str = {"2.3", "3.1", "4.8"};
reg.fit(X, y_str);

predict (single sample)

Predict the target value for a single sample.
x
const std::vector<double>&
A single feature vector to predict.
return
double
The predicted continuous value.
std::vector<double> sample = {2.0, 1.9};
double prediction = reg.predict(sample);
std::cout << "Predicted value: " << prediction << std::endl;

predict (multiple samples)

Predict target values for multiple samples.
X
const std::vector<std::vector<double>>&
Feature matrix where each inner vector represents a sample.
return
std::vector<double>
Vector of predicted continuous values.
std::vector<std::vector<double>> X_test = {
    {1.0, 1.5},
    {2.5, 2.0},
    {4.0, 3.0}
};

std::vector<double> predictions = reg.predict(X_test);
for (size_t i = 0; i < predictions.size(); ++i) {
    std::cout << "Sample " << i << ": " << predictions[i] << std::endl;
}

root

Get a pointer to the root node of the decision tree.
return
const TreeNode*
Pointer to the root TreeNode, or nullptr if the tree hasn’t been fitted.
const TreeNode* tree_root = reg.root();
if (tree_root != nullptr) {
    // Access tree structure for inspection or visualization
}

classes

Get the class labels (not typically meaningful for regression).
return
const std::vector<std::string>&
Vector of label strings (empty or minimally populated for regression tasks).
auto labels = reg.classes();
// Typically empty for pure regression tasks

Enumerations

Criterion

Criteria for measuring split quality in regression trees.
enum class Criterion {
    gini,           // Gini impurity (for classification)
    entropy,        // Information gain / entropy (for classification)
    mse,            // Mean squared error (default for regression)
    friedman_mse,   // Friedman's improvement on MSE (for regression)
    mae             // Mean absolute error (for regression)
};
For regression, use Criterion::mse, Criterion::friedman_mse, or Criterion::mae. Mean Squared Error (MSE): Minimizes the average squared difference between predictions and actual values. Best for normally distributed errors. Friedman MSE: An improved version of MSE that can lead to better splits in some cases. Mean Absolute Error (MAE): Minimizes the average absolute difference. More robust to outliers than MSE.

Data structures

TreeNode

Represents a node in the decision tree.
is_leaf
bool
Whether this node is a leaf node.
feature_index
std::size_t
Index of the feature used for splitting at this node (for internal nodes).
threshold
double
Threshold value for the split (for internal nodes).
value
double
The predicted value stored at this node (mean of training samples for regression).
class_label
std::string
Not used for regression tasks.
class_counts
std::vector<std::size_t>
Not used for regression tasks.
left
std::unique_ptr<TreeNode>
Pointer to the left child node (samples where feature <= threshold).
right
std::unique_ptr<TreeNode>
Pointer to the right child node (samples where feature > threshold).
const TreeNode* node = reg.root();
if (node && !node->is_leaf) {
    std::cout << "Split on feature " << node->feature_index 
              << " at threshold " << node->threshold << std::endl;
    std::cout << "Left subtree has " << (node->left ? "children" : "no data") << std::endl;
    std::cout << "Right subtree has " << (node->right ? "children" : "no data") << std::endl;
} else if (node && node->is_leaf) {
    std::cout << "Leaf node with predicted value: " << node->value << std::endl;
}

Example usage

#include "decision_tree.h"
#include <iostream>
#include <vector>
#include <cmath>

using namespace decision_trees;

int main() {
    // Generate synthetic regression data
    std::vector<std::vector<double>> X_train;
    std::vector<double> y_train;
    
    for (double x = 0.0; x <= 10.0; x += 0.5) {
        X_train.push_back({x});
        // y = 2x + 3 + noise
        y_train.push_back(2.0 * x + 3.0 + (std::rand() % 100 - 50) / 100.0);
    }
    
    // Create and train regressor with custom parameters
    DecisionTreeRegressor reg(
        Criterion::mse,
        5,   // max_depth
        2,   // min_samples_split
        1,   // min_samples_leaf
        0.1  // min_impurity_decrease
    );
    
    reg.fit(X_train, y_train);
    
    // Make predictions on new data
    std::vector<std::vector<double>> X_test = {
        {2.5},
        {5.0},
        {7.5}
    };
    
    std::vector<double> predictions = reg.predict(X_test);
    
    std::cout << "Predictions:\n";
    for (size_t i = 0; i < X_test.size(); ++i) {
        std::cout << "  X = " << X_test[i][0] 
                  << " => y = " << predictions[i] << std::endl;
    }
    
    // Inspect tree structure
    const TreeNode* root = reg.root();
    if (root) {
        std::cout << "\nTree root is a " 
                  << (root->is_leaf ? "leaf" : "decision node") << std::endl;
        if (!root->is_leaf) {
            std::cout << "Root splits on feature " << root->feature_index
                      << " at threshold " << root->threshold << std::endl;
        }
    }
    
    return 0;
}

Comparison with DecisionTreeClassifier

FeatureDecisionTreeRegressorDecisionTreeClassifier
TaskRegression (continuous values)Classification (discrete classes)
Default criterionCriterion::mseCriterion::gini
Output typedouble (continuous)double (class code) or std::string (class label)
Additional methodsNonepredict_class(), predict_proba()
Leaf node valueMean of training samplesMost common class

Build docs developers (and LLMs) love