Skip to main content

SelectKBest

Keeps the top-k features ranked by a univariate score function (f_classif, f_regression, or chi2).

Algorithm

Each feature is scored independently against the target; the k highest-scoring features are retained. The chi-squared score is sparse-aware for text pipelines.

Constructor

Skigen::SelectKBest<Scalar, ScoreFn> model(ScoreFn score, int k);

Parameters

ParameterDefaultDescription
score_funcFClassif, FRegression, or Chi2.
kNumber of features to keep.

Methods

MethodDescription
fit(X, y)Score and rank the features.
transform(X)Project onto the top-k features.
get_support_mask()Boolean mask of selected features.

Fitted Attributes

AccessorDescription
scores()Per-feature scores.
pvalues()Per-feature p-values.

Example

Skigen::SelectKBest<double, Skigen::feature_selection::FClassif<double>> sel({}, 5);
sel.fit(X, y);
auto X_top = sel.transform(X);
Verified against scikit-learn

This estimator is checked by the parity suite. See the generator tests/parity/generate_feature_selection_reference.py and the reference fixtures in tests/parity/data/f_classif/, exercised by tests/parity/parity_feature_selection.cpp.

API Reference

For full signatures see the SelectKBest API Reference.