Skip to main content

Lasso

The Lasso (Least Absolute Shrinkage and Selection Operator) is a linear regression model with an 1\ell_1 penalty that induces sparsity — it drives some coefficients exactly to zero, effectively performing feature selection.

Objective Function

minw12nXwy22+αw1\min_w \frac{1}{2n} \|Xw - y\|_2^2 + \alpha \|w\|_1

where nn is the number of samples and α0\alpha \ge 0 controls the regularization strength. This formulation matches scikit-learn's Lasso.

Coordinate Descent Algorithm

Skigen solves the Lasso via coordinate descent with soft-thresholding. At each iteration, the algorithm optimizes one coefficient wjw_j at a time while holding all others fixed.

Define the partial residual correlation:

ρj=Xj ⁣(yXjwj)\rho_j = X_j^\top\!\left(y - X_{\setminus j}\, w_{\setminus j}\right)

where XjX_{\setminus j} denotes all columns except jj. The closed-form update for coordinate jj is:

wjS(ρj,  nα)Xj22w_j \leftarrow \frac{S(\rho_j,\; n\alpha)}{\|X_j\|_2^2}

where S(z,γ)=sign(z)max(zγ,0)S(z, \gamma) = \operatorname{sign}(z)\max(|z| - \gamma,\, 0) is the soft-thresholding operator.

In practice, ρj\rho_j is computed efficiently using the running residual r=yXwr = y - Xw:

ρj=Xjr+Xj22wjold\rho_j = X_j^\top r + \|X_j\|_2^2 \, w_j^{\text{old}}

The algorithm cycles over all features until convergence (maximum coefficient change <tol< \texttt{tol}) or max_iter iterations.

Sparsity and Feature Selection

The 1\ell_1 penalty produces sparse solutions: at convergence, any coefficient whose partial correlation ρj|\rho_j| does not exceed nαn\alpha is set exactly to zero. This makes Lasso a natural tool for feature selection in high-dimensional settings.

When to Use

  • Feature selection: When you expect many features to be irrelevant.
  • Sparse signals: When the true underlying model involves only a few non-zero coefficients.
  • For correlated features, consider ElasticNet, which combines 1\ell_1 and 2\ell_2 penalties for stability.

Constructor

Skigen::Lasso<Scalar> model(Scalar alpha = 1, bool fit_intercept = true,
int max_iter = 1000, Scalar tol = 1e-4);
ParameterDefaultDescription
alpha1Regularization strength (α0\alpha \ge 0)
fit_intercepttrueWhether to center the data and compute an intercept
max_iter1000Maximum coordinate descent iterations
tol1e-4Convergence tolerance on coefficient updates

Methods

MethodDescription
fit(X, y)Fit the model via coordinate descent
predict(X)Predict y^=Xw+b\hat{y} = Xw + b
score(X, y)Return the R2R^2 coefficient of determination

Fitted Attributes

AccessorTypeDescription
coef()RowVectorTypeEstimated coefficients (typically sparse)
intercept()ScalarIntercept term

Example

#include <Skigen/LinearModel>

Skigen::Lasso model(/*alpha=*/0.1);
model.fit(X, y);
std::cout << "Non-zero coefs: "
<< (model.coef().array().abs() > 1e-10).count() << "\n";
API Reference

For full parameter details and method signatures, see the auto-generated Lasso API Reference.