Skip to main content

HistGradientBoostingRegressor

Histogram-based gradient boosting for regression — the fast, large-data GB path.

Algorithm

Features are quantile-binned into a compact uint8 representation, then squared-error boosting proceeds with a native gradient/hessian histogram split finder. Each tree is grown leaf-wise (best-first): the leaf with the highest second-order split gain

gain=GL2HL+λ+GR2HR+λG2H+λ\text{gain} = \frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_R + \lambda} - \frac{G^2}{H + \lambda}

is split next, bounded by max_leaf_nodes. Leaf values are the Newton step G/(H+λ)-G/(H + \lambda). The grower honours L2 regularisation (λ\lambda), per-feature monotonic constraints, and holdout-based early stopping.

For squared error the per-sample gradient is gi=Fiyig_i = F_i - y_i and the hessian is hi=1h_i = 1.

Constructor

Skigen::HistGradientBoostingRegressor<Scalar> model(
Loss loss = Loss::SquaredError,
Scalar learning_rate = 0.1,
int max_iter = 100,
std::optional<int> max_leaf_nodes = 31,
std::optional<int> max_depth = std::nullopt,
int min_samples_leaf = 20,
Scalar l2_regularization = 0.0,
int max_bins = 255,
std::optional<std::vector<int>> categorical_features = std::nullopt,
std::optional<std::vector<int>> monotonic_cst = std::nullopt,
bool early_stopping = false,
Scalar validation_fraction = 0.1,
int n_iter_no_change = 10,
Scalar tol = 1e-7,
std::optional<uint64_t> random_state = std::nullopt);

Parameters

ParameterDefaultDescription
learning_rate0.1Shrinkage per iteration.
max_iter100Number of boosting iterations.
max_leaf_nodes31Leaf-wise growth bound (nullopt = unbounded).
max_depthnulloptOptional depth cap.
min_samples_leaf20Minimum samples per leaf.
l2_regularization0.0L2 penalty on the Newton step.
max_bins255Bin resolution (2–255).
monotonic_cstnulloptPer-feature +1 / -1 / 0 constraint.
early_stoppingfalseEnable holdout-based stopping.
validation_fraction0.1Holdout size for early stopping.
n_iter_no_change10Patience before stopping.
tol1e-7Minimum validation improvement.
random_statenulloptSeed for the holdout split.

Methods

MethodDescription
fit(X, y)Bin features, then boost.
predict(X)Boosted prediction.
score(X, y)R².

Fitted Attributes

AccessorDescription
bin_edges()Per-feature quantile bin edges.
train_score()Per-iteration training MSE.

Example

Skigen::HistGradientBoostingRegressor<double> gb;
gb.fit(X, y);
auto preds = gb.predict(X_test);
Verified against scikit-learn

This estimator is checked by the parity suite. See the generator tests/parity/generate_ensemble_reference.py and the reference fixtures in tests/parity/data/hist_gradient_boosting_regressor/, exercised by tests/parity/parity_ensemble.cpp.

API Reference

For full signatures see the HistGradientBoostingRegressor API Reference.