Skip to main content

VarianceThreshold

#include <Skigen/FeatureSelection>

template <typename Scalar = double>
class Skigen::VarianceThreshold(threshold=0)

Feature selector that removes all low-variance features.

This feature selection algorithm looks only at the features (X), not the desired outputs (y), and can thus be used for unsupervised learning.

Mirrors sklearn.feature_selection.VarianceThreshold.


Parameters:

  • threshold : Scalar, default=0 Variance threshold (Scalar, default 0). Features with variance strictly less than or equal to this value (or strictly less than this value when threshold > 0) are dropped. Matches sklearn semantics: features kept satisfy variance > threshold.

Attributes:

  • threshold : Scalar The configured variance threshold.

  • variances : RowVectorType Per-feature variance (1 × n_features).

  • get_support_mask : BoolMaskType Boolean support mask: true for features that are kept.

  • get_support_indices : Eigen::VectorXi Get the integer indices of selected features.


Methods

get_support(indices)

Get the support of the selector.

Parameters:

  • indices : bool If true, return integer indices instead of a boolean mask.

Returns:

  • result : BoolMaskType Either a boolean mask of length n_features (default) or a vector of selected indices.

fit(X)

Compute per-feature variance and the support mask.


transform(X)

Reduce X to selected features.


fit(X)

Fit on a sparse matrix without densifying.

Computes the per-column variance directly from the CSC/CSR representation. For column-major storage this is O(nnz), where nnz is the number of explicit nonzeros — implicit zeros contribute to the mean and variance but are never materialised.

Variance formula (biased, ddof=0):

μj=1ninzjXij,σj2=1ninzjXij2μj2.\mu_j = \frac{1}{n} \sum_{i \in \text{nz}_j} X_{ij},\qquad \sigma_j^2 = \frac{1}{n} \sum_{i \in \text{nz}_j} X_{ij}^2 - \mu_j^2.

Matches sklearn's VarianceThreshold.fit behaviour on sparse input.


transform(X)

Reduce a sparse X to selected features without densifying.

Returns a sparse matrix with the same number of rows and only the columns where support_mask_(j) is true.


inverse_transform(X)

Reverse the transformation by zero-padding removed features.