Skip to main content

KNeighborsClassifier / KNeighborsRegressor

Instance-based learning methods that make predictions by finding the kk closest training samples in feature space. No explicit model is trained — the training data itself serves as the model.

Distance Metric

Distances are computed using the Euclidean (2\ell_2) metric:

d(x,x)=j=1p(xjxj)2d(x, x') = \sqrt{\sum_{j=1}^{p} (x_j - x'_j)^2}

Classifier — Majority Vote

Given a query point xx, the classifier identifies the kk nearest neighbors Nk(x)\mathcal{N}_k(x) and predicts the most frequent class label among them:

y^=argmaxcxiNk(x)1(yi=c)\hat{y} = \arg\max_c \sum_{x_i \in \mathcal{N}_k(x)} \mathbf{1}(y_i = c)

In case of a tie, the class with the smallest index is selected.

Regressor — Averaging

The regressor predicts the mean target value of the kk nearest neighbors:

y^=1kxiNk(x)yi\hat{y} = \frac{1}{k} \sum_{x_i \in \mathcal{N}_k(x)} y_i

Computational Complexity

Skigen uses brute-force search: prediction costs O(np)O(np) per query point, where nn is the number of training samples and pp is the number of features. This is practical for small to moderate datasets.

When to Use

  • Non-linear boundaries: KNN naturally captures complex decision boundaries without assuming a parametric form.
  • Small datasets: Works well when the dataset fits in memory and nn is not too large.
  • Feature scaling: Euclidean distance is sensitive to feature scales — standardize features before use (e.g., with StandardScaler).
  • The choice of kk trades off bias and variance: small kk is flexible but noisy; large kk is smoother but may over-smooth.

Mirrors sklearn.neighbors.KNeighborsClassifier and KNeighborsRegressor.

Constructor

Skigen::KNeighborsClassifier<Scalar> clf(int n_neighbors = 5);
Skigen::KNeighborsRegressor<Scalar> reg(int n_neighbors = 5);
ParameterDefaultDescription
n_neighbors5Number of neighbors kk

Methods (Classifier)

MethodDescription
fit(X, y)Store training data
predict(X)Predict class labels by majority vote
score(X, y)Return classification accuracy

Methods (Regressor)

MethodDescription
fit(X, y)Store training data
predict(X)Predict by averaging neighbor targets

Example

#include <Skigen/Neighbors>

Skigen::KNeighborsClassifier clf(3);
clf.fit(X_train, y_train);
std::cout << "Accuracy: " << clf.score(X_test, y_test) << "\n";

Skigen::KNeighborsRegressor reg(5);
reg.fit(X_train, y_train);
auto predictions = reg.predict(X_test);