HistGradientBoostingRegressor

Histogram-based gradient boosting for regression — the fast, large-data GB path.

Algorithm

Features are quantile-binned into a compact uint8 representation, then squared-error boosting proceeds with a native gradient/hessian histogram split finder. Each tree is grown leaf-wise (best-first): the leaf with the highest second-order split gain

\text{gain} = \frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_R + \lambda} - \frac{G^2}{H + \lambda}

is split next, bounded by max_leaf_nodes. Leaf values are the Newton step $-G/(H + \lambda)$ . The grower honours L2 regularisation ( $\lambda$ ), per-feature monotonic constraints, and holdout-based early stopping.

For squared error the per-sample gradient is $g_i = F_i - y_i$ and the hessian is $h_i = 1$ .

Constructor

Skigen::HistGradientBoostingRegressor<Scalar> model(
    Loss loss = Loss::SquaredError,
    Scalar learning_rate = 0.1,
    int max_iter = 100,
    std::optional<int> max_leaf_nodes = 31,
    std::optional<int> max_depth = std::nullopt,
    int min_samples_leaf = 20,
    Scalar l2_regularization = 0.0,
    int max_bins = 255,
    std::optional<std::vector<int>> categorical_features = std::nullopt,
    std::optional<std::vector<int>> monotonic_cst = std::nullopt,
    bool early_stopping = false,
    Scalar validation_fraction = 0.1,
    int n_iter_no_change = 10,
    Scalar tol = 1e-7,
    std::optional<uint64_t> random_state = std::nullopt);

Parameters

Parameter	Default	Description
`learning_rate`	`0.1`	Shrinkage per iteration.
`max_iter`	`100`	Number of boosting iterations.
`max_leaf_nodes`	`31`	Leaf-wise growth bound (`nullopt` = unbounded).
`max_depth`	`nullopt`	Optional depth cap.
`min_samples_leaf`	`20`	Minimum samples per leaf.
`l2_regularization`	`0.0`	L2 penalty on the Newton step.
`max_bins`	`255`	Bin resolution (2–255).
`monotonic_cst`	`nullopt`	Per-feature `+1` / `-1` / `0` constraint.
`early_stopping`	`false`	Enable holdout-based stopping.
`validation_fraction`	`0.1`	Holdout size for early stopping.
`n_iter_no_change`	`10`	Patience before stopping.
`tol`	`1e-7`	Minimum validation improvement.
`random_state`	`nullopt`	Seed for the holdout split.

Methods

Method	Description
`fit(X, y)`	Bin features, then boost.
`predict(X)`	Boosted prediction.
`score(X, y)`	R².

Fitted Attributes

Accessor	Description
`bin_edges()`	Per-feature quantile bin edges.
`train_score()`	Per-iteration training MSE.

Example

Skigen::HistGradientBoostingRegressor<double> gb;
gb.fit(X, y);
auto preds = gb.predict(X_test);

Verified against scikit-learn

This estimator is checked by the parity suite. See the generator tests/parity/generate_ensemble_reference.py and the reference fixtures in tests/parity/data/hist_gradient_boosting_regressor/, exercised by tests/parity/parity_ensemble.cpp.

API Reference

For full signatures see the HistGradientBoostingRegressor API Reference.

Algorithm​

Constructor​

Parameters​

Methods​

Fitted Attributes​

Example​