Skip to main content

SGDClassifier

#include <Skigen/LinearModel>

template <typename Scalar = double>
class Skigen::SGDClassifier(loss=Loss::Hinge, alpha=1e-4, max_iter=1000, tol=1e-3, eta0=0.01, random_state=42)

Linear classifier fitted by minimizing a regularized empirical loss with SGD.

SGDClassifier implements a plain Stochastic Gradient Descent learning routine that supports hinge loss (linear SVM), log loss (logistic regression), and perceptron loss. Binary classification uses a single weight vector; multiclass is handled via one-vs-rest.

Mirrors sklearn.linear_model.SGDClassifier.


Parameters:

  • loss : Loss, default=Loss::Hinge The loss function (Loss::Hinge, Loss::Log, or Loss::Perceptron, default Loss::Hinge).

  • alpha : Scalar, default=1e-4 Regularization constant (Scalar, default 1e-4).

  • max_iter : int, default=1000 Maximum number of epochs (int, default 1000).

  • tol : Scalar, default=1e-3 Stopping tolerance (Scalar, default 1e-3).

  • eta0 : Scalar, default=0.01 Initial learning rate (Scalar, default 0.01).

  • random_state : unsigned int, default=42 RNG seed (unsigned int, default 42).


Attributes:

  • coef : MatrixType Coefficient matrix (n_classes × n_features or 1 × n_features).

  • intercept : VectorType Intercept (bias) vector of shape (n_classes,) or (1,).


Methods

SKIGEN_PARAMS()

Fit the linear model with SGD.

Discovers unique classes in y, then trains a binary classifier per class (OvR) using stochastic gradient descent with the chosen loss function.

Parameters:

  • X Training matrix of shape (n_samples, n_features).

  • y Target vector of shape (n_samples,) with integer class labels.

Returns:

  • result Reference to the fitted estimator (*this).

Throws:

  • std::invalid_argument — if X and y have inconsistent lengths.

predict(X)

Predict class labels for samples in X.


partial_fit(X, y, classes)

Online SGD update.

Runs one epoch of SGD on the supplied (X, y) batch starting from the current coef_ / intercept_ (matching sklearn's SGDClassifier.partial_fit contract).

The first call requires classes (an Eigen::VectorXi of all possible labels — a sklearn convention for the classes argument); subsequent calls accept an empty classes vector and reuse the already-discovered class set.

Throws:

  • std::invalid_argument — when the feature count differs from the first batch, or when the first call omits classes.

Example

// SGD with hinge loss (SVM-like)
Skigen::SGDClassifier<double> svm(Skigen::SGDClassifier<double>::Loss::Hinge);
svm.fit(split.X_train, split.y_train);
auto svm_pred = svm.predict(split.X_test);

std::cout << "=== SGD Classifier (Hinge Loss) ===\n";
std::cout << "Accuracy: " << Skigen::Metrics::accuracy_score(split.y_test, svm_pred) << "\n\n";

// SGD with log loss (logistic regression-like)
Skigen::SGDClassifier<double> log_clf(Skigen::SGDClassifier<double>::Loss::Log);
log_clf.fit(split.X_train, split.y_train);
auto log_pred = log_clf.predict(split.X_test);

std::cout << "=== SGD Classifier (Log Loss) ===\n";
std::cout << "Accuracy: " << Skigen::Metrics::accuracy_score(split.y_test, log_pred) << "\n\n";