HistGradientBoostingClassifier
Histogram-based gradient boosting: features are binned up-front so split finding scans bin histograms rather than raw values, making training near-linear in the sample count.
Algorithm
Each feature is quantile-binned into at most max_bins buckets. Split finding then operates on per-bin gradient/hessian histograms with a second-order (Newton) split-gain criterion, grown leaf-wise and bounded by max_leaf_nodes. The grower supports L2 regularisation, per-feature monotonic constraints, and holdout-based early stopping.
Binary problems boost a single log-odds score with gradient and hessian . Multiclass problems boost one tree per class per iteration against the softmax cross-entropy gradient , with predictions normalised by softmax.
Constructor
Skigen::HistGradientBoostingClassifier<Scalar> model(
Loss loss = Loss::LogLoss,
Scalar learning_rate = 0.1,
int max_iter = 100,
std::optional<int> max_leaf_nodes = 31,
std::optional<int> max_depth = std::nullopt,
int min_samples_leaf = 20,
Scalar l2_regularization = 0.0,
int max_bins = 255,
std::optional<std::vector<int>> monotonic_cst = std::nullopt,
bool early_stopping = false,
Scalar validation_fraction = 0.1,
int n_iter_no_change = 10,
Scalar tol = 1e-7,
std::optional<uint64_t> random_state = std::nullopt);
Parameters
| Parameter | Default | Description |
|---|---|---|
learning_rate | 0.1 | Shrinkage per iteration. |
max_iter | 100 | Number of boosting iterations. |
max_leaf_nodes | 31 | Leaf-wise growth bound (nullopt = unbounded). |
min_samples_leaf | 20 | Minimum samples per leaf. |
l2_regularization | 0.0 | L2 penalty on the Newton step. |
max_bins | 255 | Feature quantisation resolution (2–255). |
monotonic_cst | nullopt | Per-feature +1 / -1 / 0 constraint. |
early_stopping | false | Enable holdout-based stopping. |
validation_fraction | 0.1 | Holdout size for early stopping. |
n_iter_no_change | 10 | Patience before stopping. |
random_state | nullopt | Seed for the holdout split. |
Both binary and multiclass log-loss are supported.
Methods
| Method | Description |
|---|---|
fit(X, y) | Bin features, then boost. |
predict(X) | Class labels. |
predict_proba(X) | Class probabilities (sigmoid for binary, softmax for multiclass). |
decision_function(X) | Raw scores: (n,) log-odds for binary, (n, K) for multiclass. |
score(X, y) | Mean accuracy. |
Fitted Attributes
| Accessor | Description |
|---|---|
bin_edges() | Per-feature quantile bin edges. |
train_score() | Per-iteration training log-loss. |
Example
Skigen::HistGradientBoostingClassifier<double> gb;
gb.fit(X, y);
auto preds = gb.predict(X_test);
This estimator is checked by the parity suite. See the generator tests/parity/generate_ensemble_reference.py and the reference fixtures in tests/parity/data/hist_gradient_boosting_classifier/, exercised by tests/parity/parity_ensemble.cpp.
For full signatures see the HistGradientBoostingClassifier API Reference.