Skip to main content

Ridge

Ridge regression addresses the instability of Ordinary Least Squares when features are correlated or when the design matrix is near-singular. It does so by adding an 2\ell_2 penalty on the coefficient vector, shrinking all coefficients toward zero and improving the conditioning of the problem.

Objective Function

minwXwy22+αw22\min_w \|Xw - y\|_2^2 + \alpha \|w\|_2^2

where α0\alpha \ge 0 controls the regularization strength: larger α\alpha increases shrinkage and reduces variance at the cost of higher bias. This formulation matches scikit-learn's Ridge, which minimizes yXw22+αw22\|y - Xw\|_2^2 + \alpha\|w\|_2^2.

Closed-Form Solution

Setting the gradient to zero yields the normal equation:

(XX+αI)w=Xy(X^\top X + \alpha I)\, w = X^\top y

Skigen solves this system via Cholesky decomposition (Eigen::LLT), which exploits the fact that XX+αIX^\top X + \alpha I is symmetric positive definite for α>0\alpha > 0. The solution is:

w^=(XX+αI)1Xy\hat{w} = (X^\top X + \alpha I)^{-1} X^\top y

When fit_intercept is enabled, the data is centered before fitting: Xc=X1xˉX_c = X - \mathbf{1}\bar{x}^\top and yc=yyˉy_c = y - \bar{y}. The intercept is then recovered as b=yˉxˉw^b = \bar{y} - \bar{x}^\top \hat{w}.

Computational Complexity

Forming XXX^\top X costs O(np2)O(n p^2), and the Cholesky solve costs O(p3)O(p^3), giving an overall complexity of O(np2+p3)O(n p^2 + p^3) — the same order as OLS.

When to Use

  • Multicollinearity: When features are correlated and OLS produces unstable estimates with large variance.
  • Ill-conditioned problems: When XXX^\top X is near-singular, Ridge stabilizes the solution by shifting eigenvalues away from zero.
  • 2\ell_2 preference: When all features are expected to contribute and full sparsity (zeroing coefficients) is not desired — use Lasso or ElasticNet for that.

Constructor

Skigen::Ridge<Scalar> model(Scalar alpha = 1, bool fit_intercept = true);
ParameterDefaultDescription
alpha1Regularization strength (α0\alpha \ge 0)
fit_intercepttrueWhether to center the data and compute an intercept

Methods

MethodDescription
fit(X, y)Fit the model by solving (XX+αI)w=Xy(X^\top X + \alpha I)w = X^\top y
predict(X)Predict y^=Xw+b\hat{y} = Xw + b
score(X, y)Return the R2R^2 coefficient of determination

Fitted Attributes

AccessorTypeDescription
coef()RowVectorTypeEstimated coefficient vector w^\hat{w}
intercept()ScalarIntercept term bb

Example

#include <Skigen/LinearModel>

Skigen::Ridge model(/*alpha=*/0.5);
model.fit(X, y);
auto predictions = model.predict(X_test);
API Reference

For full parameter details and method signatures, see the auto-generated Ridge API Reference.