Skip to main content

ElasticNet

ElasticNet combines 1\ell_1 (Lasso) and 2\ell_2 (Ridge) regularization, inheriting sparsity from Lasso and stability from Ridge. It is particularly useful when features are correlated: Lasso tends to arbitrarily select one feature from a correlated group, while ElasticNet distributes weight across them.

Objective Function

minw12nXwy22+α(ρw1+1ρ2w22)\min_w \frac{1}{2n} \|Xw - y\|_2^2 + \alpha \left( \rho \|w\|_1 + \frac{1-\rho}{2} \|w\|_2^2 \right)

where ρ[0,1]\rho \in [0, 1] is the l1_ratio controlling the mix between 1\ell_1 and 2\ell_2 penalties:

  • ρ=1\rho = 1: pure Lasso
  • ρ=0\rho = 0: pure Ridge (scaled by 1/(2n)1/(2n))

This formulation matches scikit-learn's ElasticNet.

Coordinate Descent

Like Lasso, ElasticNet is solved via coordinate descent. The update for coefficient jj is:

wjS(ρj,  nαρ)Xj22+nα(1ρ)w_j \leftarrow \frac{S(\rho_j,\; n \alpha \rho)}{\|X_j\|_2^2 + n \alpha (1 - \rho)}

where ρj=Xj(yXjwj)\rho_j = X_j^\top(y - X_{\setminus j}\, w_{\setminus j}) and SS is the soft-thresholding operator. The 2\ell_2 term adds nα(1ρ)n\alpha(1-\rho) to the denominator, preventing the instability that Lasso can exhibit with correlated features.

When to Use

  • Correlated features: ElasticNet is preferred over Lasso when features are highly correlated.
  • Grouped selection: The 2\ell_2 component encourages correlated features to be selected together.
  • Regularization path: ElasticNet's convex combination provides a smooth transition between Ridge and Lasso behavior.

Constructor

Skigen::ElasticNet<Scalar> model(Scalar alpha = 1, Scalar l1_ratio = 0.5,
bool fit_intercept = true,
int max_iter = 1000, Scalar tol = 1e-4);
ParameterDefaultDescription
alpha1Overall regularization strength (α0\alpha \ge 0)
l1_ratio0.5Mix ratio ρ\rho (11 = Lasso, 00 = Ridge)
fit_intercepttrueWhether to center the data and compute an intercept
max_iter1000Maximum coordinate descent iterations
tol1e-4Convergence tolerance

Methods

MethodDescription
fit(X, y)Fit the model via coordinate descent
predict(X)Predict y^=Xw+b\hat{y} = Xw + b
score(X, y)Return the R2R^2 coefficient of determination

Fitted Attributes

AccessorTypeDescription
coef()RowVectorTypeEstimated coefficients
intercept()ScalarIntercept term

Example

#include <Skigen/LinearModel>

Skigen::ElasticNet model(/*alpha=*/0.5, /*l1_ratio=*/0.7);
model.fit(X, y);
auto predictions = model.predict(X_test);

References

  • Zou, H. and Hastie, T. (2005). "Regularization and variable selection via the elastic net." Journal of the Royal Statistical Society: Series B, 67(2), 301–320.
API Reference

For full parameter details and method signatures, see the ElasticNet API Reference.