Skip to main content

DecisionTreeClassifier

A CART (Classification and Regression Trees) decision tree classifier. The algorithm recursively partitions the feature space with binary splits, choosing at each node the feature and threshold that best separate the classes.

Splitting Criterion — Gini Impurity

Gini impurity measures the probability that a randomly chosen sample from node tt would be misclassified if labeled according to the class distribution at that node:

Gini(t)=1k=1Kpk2\text{Gini}(t) = 1 - \sum_{k=1}^{K} p_k^2

where pkp_k is the proportion of class-kk samples at node tt. A pure node (Gini=0\text{Gini} = 0) contains only one class.

At each split, the algorithm searches over all features and thresholds for the partition that minimizes the weighted Gini impurity of the two child nodes:

Ginisplit=nLnGini(tL)+nRnGini(tR)\text{Gini}_{\text{split}} = \frac{n_L}{n} \,\text{Gini}(t_L) + \frac{n_R}{n} \,\text{Gini}(t_R)

The information gain of a split is Gini(t)Ginisplit\text{Gini}(t) - \text{Gini}_{\text{split}}.

CART Algorithm

The tree is built recursively using the CART algorithm:

  1. At each node, evaluate every possible split (j,θ)(j, \theta) — feature jj, threshold θ\theta — and select the one that minimizes Ginisplit\text{Gini}_{\text{split}}.
  2. Create left child (XjθX_j \le \theta) and right child (Xj>θX_j > \theta).
  3. Recurse until a stopping condition is met: max_depth reached, fewer than min_samples_split samples, or the node is pure.

Prediction assigns the majority class of the leaf node reached by the query sample.

When to Use

  • Interpretability: Decision trees are easy to visualize and explain.
  • No scaling needed: Trees are invariant to monotonic transformations of features.
  • High variance: Single trees are prone to overfitting — consider ensemble methods (Random Forest, Gradient Boosting) for better generalization.

Mirrors sklearn.tree.DecisionTreeClassifier.

Constructor

Skigen::DecisionTreeClassifier<Scalar> tree(int max_depth = -1,
int min_samples_split = 2);
ParameterDefaultDescription
max_depth-1Maximum tree depth (1-1 = unlimited)
min_samples_split2Minimum samples required to split a node

Methods

MethodDescription
fit(X, y)Build the decision tree
predict(X)Predict class labels
score(X, y)Return classification accuracy

Example

#include <Skigen/Tree>

Skigen::DecisionTreeClassifier tree(/*max_depth=*/5);
tree.fit(X_train, y_train);
std::cout << "Accuracy: " << tree.score(X_test, y_test) << "\n";