KNeighborsClassifier / KNeighborsRegressor
Instance-based learning methods that make predictions by finding the closest training samples in feature space. No explicit model is trained — the training data itself serves as the model.
Distance Metric
Distances are computed using the Euclidean () metric:
Classifier — Majority Vote
Given a query point , the classifier identifies the nearest neighbors and predicts the most frequent class label among them:
In case of a tie, the class with the smallest index is selected.
Regressor — Averaging
The regressor predicts the mean target value of the nearest neighbors:
Computational Complexity
Skigen uses brute-force search: prediction costs per query point, where is the number of training samples and is the number of features. This is practical for small to moderate datasets.
When to Use
- Non-linear boundaries: KNN naturally captures complex decision boundaries without assuming a parametric form.
- Small datasets: Works well when the dataset fits in memory and is not too large.
- Feature scaling: Euclidean distance is sensitive to feature scales — standardize features before use (e.g., with StandardScaler).
- The choice of trades off bias and variance: small is flexible but noisy; large is smoother but may over-smooth.
Mirrors sklearn.neighbors.KNeighborsClassifier and KNeighborsRegressor.
Constructor
Skigen::KNeighborsClassifier<Scalar> clf(int n_neighbors = 5);
Skigen::KNeighborsRegressor<Scalar> reg(int n_neighbors = 5);
| Parameter | Default | Description |
|---|---|---|
n_neighbors | 5 | Number of neighbors |
Methods (Classifier)
| Method | Description |
|---|---|
fit(X, y) | Store training data |
predict(X) | Predict class labels by majority vote |
score(X, y) | Return classification accuracy |
Methods (Regressor)
| Method | Description |
|---|---|
fit(X, y) | Store training data |
predict(X) | Predict by averaging neighbor targets |
Example
#include <Skigen/Neighbors>
Skigen::KNeighborsClassifier clf(3);
clf.fit(X_train, y_train);
std::cout << "Accuracy: " << clf.score(X_test, y_test) << "\n";
Skigen::KNeighborsRegressor reg(5);
reg.fit(X_train, y_train);
auto predictions = reg.predict(X_test);