Skip to main content

SGDClassifier / SGDRegressor

Linear models fitted via Stochastic Gradient Descent, well-suited for large-scale learning where batch solvers are prohibitively expensive. SGD processes one sample at a time, updating the weight vector with a noisy gradient estimate.

Loss Functions

Classifier

Hinge loss (linear SVM) — for maximum-margin classification with labels yi{1,+1}y_i \in \{-1, +1\}:

Li=max(0,  1yiwxi)L_i = \max(0,\; 1 - y_i \, w^\top x_i)

Log loss (logistic regression) — for probabilistic binary classification with labels yi{0,1}y_i \in \{0, 1\}:

Li=[yilogσ(wxi)+(1yi)log(1σ(wxi))]L_i = -\left[ y_i \log \sigma(w^\top x_i) + (1 - y_i) \log(1 - \sigma(w^\top x_i)) \right]

where σ(z)=1/(1+ez)\sigma(z) = 1/(1 + e^{-z}) is the sigmoid function.

Regressor

Squared error loss:

Li=12(yiwxi)2L_i = \frac{1}{2}(y_i - w^\top x_i)^2

Weight Update Rule

For each randomly selected sample (xi,yi)(x_i, y_i), the weight vector is updated as:

wwηt(wLi+αw)w \leftarrow w - \eta_t \left( \nabla_w L_i + \alpha\, w \right)

where αw\alpha\, w is the 2\ell_2 regularization gradient and ηt\eta_t is the learning rate at step tt.

Learning Rate Schedule

Skigen uses an inverse scaling schedule matching scikit-learn's "invscaling":

ηt=η01+η0αt\eta_t = \frac{\eta_0}{1 + \eta_0 \cdot \alpha \cdot t}

This schedule gradually decreases the step size, ensuring convergence to the optimum.

Convergence

Training runs for at most max_iter epochs. At the end of each epoch, convergence is checked: training stops when the loss improvement drops below tol. Shuffling the data each epoch is recommended for faster convergence.

Mirrors sklearn.linear_model.SGDClassifier and SGDRegressor.

Constructor

Skigen::SGDClassifier<Scalar> clf(
Loss loss = Loss::Hinge,
Scalar alpha = 1e-4, int max_iter = 1000, Scalar tol = 1e-3,
Scalar eta0 = 0.01, unsigned int random_state = 42);

Skigen::SGDRegressor<Scalar> reg(
Scalar alpha = 1e-4, int max_iter = 1000, Scalar tol = 1e-3,
Scalar eta0 = 0.01, unsigned int random_state = 42);
ParameterDefaultDescription
lossHingeLoss function: Hinge or Log (classifier only)
alpha1e-42\ell_2 regularization strength
max_iter1000Maximum number of epochs
tol1e-3Convergence tolerance
eta00.01Initial learning rate η0\eta_0
random_state42Random seed for shuffling

Methods

MethodDescription
fit(X, y)Fit the model via stochastic gradient descent
predict(X)Predict class labels (classifier) or values (regressor)
score(X, y)Accuracy (classifier) or R2R^2 (regressor)

Example

#include <Skigen/LinearModel>

// SGD with log loss (equivalent to logistic regression)
Skigen::SGDClassifier clf(Skigen::SGDClassifier<>::Loss::Log);
clf.fit(X_train, y_train);
std::cout << "Accuracy: " << clf.score(X_test, y_test) << "\n";

// SGD for regression
Skigen::SGDRegressor reg;
reg.fit(X_train, y_train);
std::cout << "R²: " << reg.score(X_test, y_test) << "\n";