DecisionTreeRegressor

Overview

The DecisionTreeRegressor class implements a decision tree algorithm for regression problems. It builds a tree structure by recursively splitting the data based on features that minimize variance or error in the target values.

Constructor

criterion

Criterion

default:"Criterion::mse"

The function to measure the quality of a split. Supported criteria are:

Criterion::mse - Mean squared error (default for regression)
Criterion::friedman_mse - Friedman’s improvement on MSE
Criterion::mae - Mean absolute error

max_depth

std::size_t

default:"std::numeric_limits<std::size_t>::max()"

The maximum depth of the tree. If not set, nodes are expanded until all leaves contain fewer than min_samples_split samples.

min_samples_split

std::size_t

default:"2"

The minimum number of samples required to split an internal node.

min_samples_leaf

std::size_t

default:"1"

The minimum number of samples required to be at a leaf node.

min_impurity_decrease

double

default:"0.0"

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

#include "decision_tree.h"
using namespace decision_trees;

// Create a regressor with default parameters
DecisionTreeRegressor reg;

// Create a regressor with custom parameters
DecisionTreeRegressor reg_custom(
    Criterion::mae,
    8,   // max_depth
    10,  // min_samples_split
    5,   // min_samples_leaf
    0.05 // min_impurity_decrease
);

Methods

fit (primary method)

Train the decision tree regressor on continuous target data.

const std::vector<std::vector<double>>&

Training feature matrix where each inner vector represents a sample.

const std::vector<double>&

Target values (continuous, real-valued).

std::vector<std::vector<double>> X = {
    {0.5, 1.2},
    {1.5, 1.8},
    {2.5, 2.1},
    {3.5, 2.9},
    {4.5, 3.2}
};

std::vector<double> y = {2.3, 3.1, 4.8, 6.2, 7.5};

DecisionTreeRegressor reg;
reg.fit(X, y);

fit (string labels - not recommended)

Overridden method for compatibility with the base class. Not typically used for regression.

const std::vector<std::vector<double>>&

Training feature matrix where each inner vector represents a sample.

const std::vector<std::string>&

Target values as strings (will be converted to numeric internally).

// Not the typical usage pattern for regression
std::vector<std::string> y_str = {"2.3", "3.1", "4.8"};
reg.fit(X, y_str);

predict (single sample)

Predict the target value for a single sample.

const std::vector<double>&

A single feature vector to predict.

return

double

The predicted continuous value.

std::vector<double> sample = {2.0, 1.9};
double prediction = reg.predict(sample);
std::cout << "Predicted value: " << prediction << std::endl;

predict (multiple samples)

Predict target values for multiple samples.

const std::vector<std::vector<double>>&

Feature matrix where each inner vector represents a sample.

return

std::vector<double>

Vector of predicted continuous values.

std::vector<std::vector<double>> X_test = {
    {1.0, 1.5},
    {2.5, 2.0},
    {4.0, 3.0}
};

std::vector<double> predictions = reg.predict(X_test);
for (size_t i = 0; i < predictions.size(); ++i) {
    std::cout << "Sample " << i << ": " << predictions[i] << std::endl;
}

root

Get a pointer to the root node of the decision tree.

return

const TreeNode*

Pointer to the root TreeNode, or nullptr if the tree hasn’t been fitted.

const TreeNode* tree_root = reg.root();
if (tree_root != nullptr) {
    // Access tree structure for inspection or visualization
}

classes

Get the class labels (not typically meaningful for regression).

return

const std::vector<std::string>&

Vector of label strings (empty or minimally populated for regression tasks).

auto labels = reg.classes();
// Typically empty for pure regression tasks

Enumerations

Criterion

Criteria for measuring split quality in regression trees.

enum class Criterion {
    gini,           // Gini impurity (for classification)
    entropy,        // Information gain / entropy (for classification)
    mse,            // Mean squared error (default for regression)
    friedman_mse,   // Friedman's improvement on MSE (for regression)
    mae             // Mean absolute error (for regression)
};

For regression, use Criterion::mse, Criterion::friedman_mse, or Criterion::mae. Mean Squared Error (MSE): Minimizes the average squared difference between predictions and actual values. Best for normally distributed errors. Friedman MSE: An improved version of MSE that can lead to better splits in some cases. Mean Absolute Error (MAE): Minimizes the average absolute difference. More robust to outliers than MSE.

Data structures

TreeNode

Represents a node in the decision tree.

is_leaf

bool

Whether this node is a leaf node.

feature_index

std::size_t

Index of the feature used for splitting at this node (for internal nodes).

threshold

double

Threshold value for the split (for internal nodes).

value

double

The predicted value stored at this node (mean of training samples for regression).

class_label

std::string

Not used for regression tasks.

class_counts

std::vector<std::size_t>

Not used for regression tasks.

left

std::unique_ptr<TreeNode>

Pointer to the left child node (samples where feature <= threshold).

right

std::unique_ptr<TreeNode>

Pointer to the right child node (samples where feature > threshold).

const TreeNode* node = reg.root();
if (node && !node->is_leaf) {
    std::cout << "Split on feature " << node->feature_index 
              << " at threshold " << node->threshold << std::endl;
    std::cout << "Left subtree has " << (node->left ? "children" : "no data") << std::endl;
    std::cout << "Right subtree has " << (node->right ? "children" : "no data") << std::endl;
} else if (node && node->is_leaf) {
    std::cout << "Leaf node with predicted value: " << node->value << std::endl;
}

Example usage

#include "decision_tree.h"
#include <iostream>
#include <vector>
#include <cmath>

using namespace decision_trees;

int main() {
    // Generate synthetic regression data
    std::vector<std::vector<double>> X_train;
    std::vector<double> y_train;
    
    for (double x = 0.0; x <= 10.0; x += 0.5) {
        X_train.push_back({x});
        // y = 2x + 3 + noise
        y_train.push_back(2.0 * x + 3.0 + (std::rand() % 100 - 50) / 100.0);
    }
    
    // Create and train regressor with custom parameters
    DecisionTreeRegressor reg(
        Criterion::mse,
        5,   // max_depth
        2,   // min_samples_split
        1,   // min_samples_leaf
        0.1  // min_impurity_decrease
    );
    
    reg.fit(X_train, y_train);
    
    // Make predictions on new data
    std::vector<std::vector<double>> X_test = {
        {2.5},
        {5.0},
        {7.5}
    };
    
    std::vector<double> predictions = reg.predict(X_test);
    
    std::cout << "Predictions:\n";
    for (size_t i = 0; i < X_test.size(); ++i) {
        std::cout << "  X = " << X_test[i][0] 
                  << " => y = " << predictions[i] << std::endl;
    }
    
    // Inspect tree structure
    const TreeNode* root = reg.root();
    if (root) {
        std::cout << "\nTree root is a " 
                  << (root->is_leaf ? "leaf" : "decision node") << std::endl;
        if (!root->is_leaf) {
            std::cout << "Root splits on feature " << root->feature_index
                      << " at threshold " << root->threshold << std::endl;
        }
    }
    
    return 0;
}

Comparison with DecisionTreeClassifier

Feature	DecisionTreeRegressor	DecisionTreeClassifier
Task	Regression (continuous values)	Classification (discrete classes)
Default criterion	`Criterion::mse`	`Criterion::gini`
Output type	`double` (continuous)	`double` (class code) or `std::string` (class label)
Additional methods	None	`predict_class()`, `predict_proba()`
Leaf node value	Mean of training samples	Most common class

Regression

Classification

Decision Trees

Clustering

Dimensionality Reduction

Kernels

Model Validation

Loss Functions

DecisionTreeRegressor

Overview

Constructor

Methods

fit (primary method)

fit (string labels - not recommended)

predict (single sample)

predict (multiple samples)

root

classes

Enumerations

Criterion

Data structures

TreeNode

Example usage

Comparison with DecisionTreeClassifier

Build docs developers (and LLMs) love

Regression

Classification

Decision Trees

Clustering

Dimensionality Reduction

Kernels

Model Validation

Loss Functions

​Overview

​Constructor

​Methods

​fit (primary method)

​fit (string labels - not recommended)

​predict (single sample)

​predict (multiple samples)

​root

​classes

​Enumerations

​Criterion

​Data structures

​TreeNode

​Example usage

​Comparison with DecisionTreeClassifier

Build docs developers (and LLMs) love

Overview

Constructor

Methods

fit (primary method)

fit (string labels - not recommended)

predict (single sample)

predict (multiple samples)

root

classes

Enumerations

Criterion

Data structures

TreeNode

Example usage

Comparison with DecisionTreeClassifier