Decision trees recursively partition the feature space to make predictions. MLPP provides CART-style decision trees for both classification and regression tasks.
Overview
Decision trees learn a hierarchy of if-then-else rules by selecting the best feature and threshold at each node to maximize information gain (classification) or minimize variance (regression).
Key concepts
- Internal nodes: Test a feature against a threshold
- Leaf nodes: Make predictions (class label or regression value)
- Splitting criterion: Metric to evaluate split quality (Gini, entropy, MSE)
- Support vectors: Training samples at each node used for splitting
Decision tree classifier
Classifies samples by learning decision rules from features.
Basic usage
#include <decision_trees/decision_tree.h>
using namespace decision_trees;
// Create classifier with Gini criterion
DecisionTreeClassifier classifier(
Criterion::gini, // splitting criterion
10, // max_depth
2, // min_samples_split
1, // min_samples_leaf
0.0 // min_impurity_decrease
);
// Prepare training data
std::vector<std::vector<double>> X_train = {
{5.1, 3.5, 1.4, 0.2},
{4.9, 3.0, 1.4, 0.2},
// ...
};
std::vector<std::string> y_train = {"setosa", "setosa", /* ... */};
// Fit the tree
classifier.fit(X_train, y_train);
// Predict class for new sample
std::vector<double> x_new = {5.0, 3.2, 1.2, 0.2};
std::string prediction = classifier.predict_class(x_new);
// Predict probabilities
auto probs = classifier.predict_proba(x_new);
// Batch predictions
auto predictions = classifier.predict_class(X_test);
Constructor parameters
Splitting criterion for classification:
Criterion::gini - Gini impurity (default, faster)
Criterion::entropy - Information gain (slower, more precise)
Maximum depth of the tree. Use smaller values to prevent overfitting.
Minimum number of samples required to split an internal node.
Minimum number of samples required at each leaf node.
Split only if impurity decrease is at least this value.
Methods
void fit(const std::vector<std::vector<double>>& X,
const std::vector<std::string>& y)
Fit the classifier to training data.
X: Feature matrix (n_samples × n_features)
y: Class labels as strings
std::string predict_class(const std::vector<double>& x) const
Predict class label for a single sample.
std::vector<std::string> predict_class(
const std::vector<std::vector<double>>& X) const
Predict class labels for multiple samples.
std::vector<double> predict_proba(const std::vector<double>& x) const
Return class probabilities (one value per class in sorted order).
double predict(const std::vector<double>& x) const
Predict encoded class label (internal numeric representation).
const TreeNode* root() const noexcept
Access the root node for tree inspection.
classes
const std::vector<std::string>&
const std::vector<std::string>& classes() const noexcept
Get list of class labels learned during training.
Splitting criteria
Measures how often a randomly chosen element would be incorrectly classified:where pᵢ is the proportion of class i in set S.DecisionTreeClassifier classifier(Criterion::gini);
Faster to compute and works well in practice. Measures information content:Entropy(S) = -Σ pᵢ log₂(pᵢ)
DecisionTreeClassifier classifier(Criterion::entropy);
More theoretically grounded but computationally slower.
Decision tree regressor
Predicts continuous values by learning decision rules.
Basic usage
#include <decision_trees/decision_tree.h>
using namespace decision_trees;
// Create regressor with MSE criterion
DecisionTreeRegressor regressor(
Criterion::mse, // splitting criterion
10, // max_depth
2, // min_samples_split
1, // min_samples_leaf
0.0 // min_impurity_decrease
);
// Prepare training data
std::vector<std::vector<double>> X_train = {
{0.5, 1.2},
{1.0, 1.8},
// ...
};
std::vector<double> y_train = {2.3, 4.5, /* ... */};
// Fit the tree
regressor.fit(X_train, y_train);
// Predict value for new sample
std::vector<double> x_new = {0.8, 1.5};
double prediction = regressor.predict(x_new);
// Batch predictions
auto predictions = regressor.predict(X_test);
Constructor parameters
Splitting criterion for regression:
Criterion::mse - Mean squared error (default)
Criterion::friedman_mse - Friedman’s improvement on MSE
Criterion::mae - Mean absolute error (more robust to outliers)
Maximum depth of the tree. Control tree complexity.
Minimum samples required to split a node.
Minimum samples required at leaf nodes.
Minimum variance reduction required to split.
Methods
void fit(const std::vector<std::vector<double>>& X,
const std::vector<double>& y)
Fit the regressor to training data.
X: Feature matrix (n_samples × n_features)
y: Continuous target values
double predict(const std::vector<double>& x) const
Predict target value for a single sample.
std::vector<double> predict(
const std::vector<std::vector<double>>& X) const
Predict target values for multiple samples.
const TreeNode* root() const noexcept
Access the root node for tree inspection.
Splitting criteria for regression
Mean squared error:MSE(S) = (1/n) Σ (yᵢ - ȳ)²
DecisionTreeRegressor regressor(Criterion::mse);
Standard choice for regression trees. Friedman’s improvement score:Score = (n_left · n_right / n²) · (ȳ_left - ȳ_right)²
DecisionTreeRegressor regressor(Criterion::friedman_mse);
Often produces better trees for regression. Mean absolute error:MAE(S) = (1/n) Σ |yᵢ - median(y)|
DecisionTreeRegressor regressor(Criterion::mae);
More robust to outliers but slower to compute.
Tree structure
Inspect the learned tree structure using the TreeNode class:
struct TreeNode {
bool is_leaf; // True for leaf nodes
std::size_t feature_index; // Feature to split on (internal nodes)
double threshold; // Split threshold (internal nodes)
double value; // Predicted value (leaf nodes - regression)
std::string class_label; // Predicted class (leaf nodes - classification)
std::vector<std::size_t> class_counts; // Class distribution (classification)
std::unique_ptr<TreeNode> left; // Left child (feature <= threshold)
std::unique_ptr<TreeNode> right; // Right child (feature > threshold)
};
Traversing the tree
void print_tree(const TreeNode* node, int depth = 0) {
if (!node) return;
std::string indent(depth * 2, ' ');
if (node->is_leaf) {
std::cout << indent << "Leaf: " << node->class_label
<< " (value: " << node->value << ")" << std::endl;
} else {
std::cout << indent << "Split: feature " << node->feature_index
<< " <= " << node->threshold << std::endl;
print_tree(node->left.get(), depth + 1);
print_tree(node->right.get(), depth + 1);
}
}
// Usage
auto root = classifier.root();
print_tree(root);
Preventing overfitting
Decision trees easily overfit without constraints. Use these techniques:
Pre-pruning (preferred)
Set constraints during tree construction:
DecisionTreeClassifier classifier(
Criterion::gini,
5, // max_depth: Limit tree depth
10, // min_samples_split: Require more samples to split
5, // min_samples_leaf: Require more samples per leaf
0.01 // min_impurity_decrease: Only split if gain > 0.01
);
Start with max_depth = 5-10 and min_samples_leaf = 5-10. Tune these hyperparameters via cross-validation.
Hyperparameter guidelines
| Parameter | Effect | Recommended range |
|---|
max_depth | Limits tree depth | 3-20 |
min_samples_split | Requires more samples to split | 2-50 |
min_samples_leaf | Requires more samples per leaf | 1-20 |
min_impurity_decrease | Requires minimum gain | 0.0-0.1 |
Example: Complete workflow
#include <decision_trees/decision_tree.h>
#include <iostream>
#include <vector>
using namespace decision_trees;
int main() {
// Prepare classification data
std::vector<std::vector<double>> X_train = {
{5.1, 3.5, 1.4, 0.2},
{4.9, 3.0, 1.4, 0.2},
{7.0, 3.2, 4.7, 1.4},
{6.4, 3.2, 4.5, 1.5},
{6.3, 3.3, 6.0, 2.5},
{5.8, 2.7, 5.1, 1.9}
};
std::vector<std::string> y_train = {
"setosa", "setosa", "versicolor",
"versicolor", "virginica", "virginica"
};
// Create and train classifier
DecisionTreeClassifier classifier(
Criterion::gini,
3, // max_depth
2, // min_samples_split
1 // min_samples_leaf
);
classifier.fit(X_train, y_train);
// Make predictions
std::vector<double> x_new = {6.0, 3.0, 4.8, 1.8};
std::string prediction = classifier.predict_class(x_new);
auto probs = classifier.predict_proba(x_new);
std::cout << "Predicted class: " << prediction << std::endl;
std::cout << "Class probabilities: ";
for (double p : probs) {
std::cout << p << " ";
}
std::cout << std::endl;
// Inspect tree structure
auto root = classifier.root();
std::cout << "Tree depth: " << calculate_depth(root) << std::endl;
std::cout << "Number of leaves: " << count_leaves(root) << std::endl;
// Try regression
std::vector<std::vector<double>> X_reg = {{1.0}, {2.0}, {3.0}, {4.0}};
std::vector<double> y_reg = {2.0, 4.0, 6.0, 8.0};
DecisionTreeRegressor regressor(Criterion::mse, 3);
regressor.fit(X_reg, y_reg);
double pred = regressor.predict({2.5});
std::cout << "Regression prediction: " << pred << std::endl;
return 0;
}
Advantages and limitations
Advantages
- Interpretable: Easy to visualize and understand
- Non-parametric: No assumptions about data distribution
- Handles non-linear relationships: Automatically learns interactions
- Mixed data types: Works with numerical and categorical features
- Feature importance: Can rank feature contributions
Limitations
- Overfitting: Prone to overfitting without constraints
- Instability: Small data changes can produce very different trees
- Bias: Biased toward features with many values
- Not smooth: Predictions are piecewise constant (regression)
For better performance, consider ensemble methods like Random Forests or Gradient Boosting, which combine multiple decision trees to reduce overfitting and improve stability.
Comparison: Classification vs regression
| Aspect | DecisionTreeClassifier | DecisionTreeRegressor |
|---|
| Output | Discrete class labels | Continuous values |
| Leaf prediction | Majority class | Mean of target values |
| Splitting criteria | Gini, entropy | MSE, Friedman MSE, MAE |
| Probability estimates | Yes (via predict_proba) | No |
| Use case | Spam detection, medical diagnosis | House price prediction, sales forecasting |