KNeighborsClassifier / KNeighborsRegressor

Instance-based learning methods that make predictions by finding the $k$ closest training samples in feature space. No explicit model is trained — the training data itself serves as the model.

Distance Metric

Distances are computed using the Euclidean ( $\ell_2$ ) metric:

d(x, x') = \sqrt{\sum_{j=1}^{p} (x_j - x'_j)^2}

Classifier — Majority Vote

Given a query point $x$ , the classifier identifies the $k$ nearest neighbors $\mathcal{N}_k(x)$ and predicts the most frequent class label among them:

\hat{y} = \arg\max_c \sum_{x_i \in \mathcal{N}_k(x)} \mathbf{1}(y_i = c)

In case of a tie, the class with the smallest index is selected.

Regressor — Averaging

The regressor predicts the mean target value of the $k$ nearest neighbors:

\hat{y} = \frac{1}{k} \sum_{x_i \in \mathcal{N}_k(x)} y_i

Computational Complexity

Skigen uses brute-force search: prediction costs $O(np)$ per query point, where $n$ is the number of training samples and $p$ is the number of features. This is practical for small to moderate datasets.

When to Use

Non-linear boundaries: KNN naturally captures complex decision boundaries without assuming a parametric form.
Small datasets: Works well when the dataset fits in memory and $n$ is not too large.
Feature scaling: Euclidean distance is sensitive to feature scales — standardize features before use (e.g., with StandardScaler).
The choice of $k$ trades off bias and variance: small $k$ is flexible but noisy; large $k$ is smoother but may over-smooth.

Mirrors sklearn.neighbors.KNeighborsClassifier and KNeighborsRegressor.

Constructor

Skigen::KNeighborsClassifier<Scalar> clf(int n_neighbors = 5);
Skigen::KNeighborsRegressor<Scalar>  reg(int n_neighbors = 5);

Parameter	Default	Description
`n_neighbors`	`5`	Number of neighbors $k$

Methods (Classifier)

Method	Description
`fit(X, y)`	Store training data
`predict(X)`	Predict class labels by majority vote
`score(X, y)`	Return classification accuracy

Methods (Regressor)

Method	Description
`fit(X, y)`	Store training data
`predict(X)`	Predict by averaging neighbor targets

Example

#include <Skigen/Neighbors>

Skigen::KNeighborsClassifier clf(3);
clf.fit(X_train, y_train);
std::cout << "Accuracy: " << clf.score(X_test, y_test) << "\n";

Skigen::KNeighborsRegressor reg(5);
reg.fit(X_train, y_train);
auto predictions = reg.predict(X_test);

Distance Metric​

Classifier — Majority Vote​

Regressor — Averaging​

Computational Complexity​

When to Use​

Constructor​

Methods (Classifier)​

Methods (Regressor)​

Example​