KMeans
#include <Skigen/Cluster>
template <typename Scalar = double>
class Skigen::KMeans(n_clusters=8, max_iter=300, n_init=10, random_state=42)
K-Means clustering.
The KMeans algorithm clusters data by trying to separate samples in n_clusters groups of equal variance, minimizing the within-cluster sum-of-squares (inertia). Uses k-means++ initialization and Lloyd's iterative algorithm.
Mirrors sklearn.cluster.KMeans.
Parameters:
-
n_clusters : int, default=8 The number of clusters (
int, default8). -
max_iter : int, default=300 Maximum iterations per run (
int, default300). -
n_init : int, default=10 Number of runs with different seeds (
int, default10). -
random_state : unsigned int, default=42 RNG seed (
unsigned int, default42).
Attributes:
-
is_fitted : bool Whether the estimator has been fitted.
-
n_clusters : int The number of clusters.
-
cluster_centers : MatrixType Cluster centers (n_clusters × n_features).
-
labels : IndexVector Labels of each training point from the best run.
-
inertia : Scalar Sum of squared distances to closest cluster center.
-
n_iter : int Number of iterations in the best run.
Methods
fit(X)
Compute k-means clustering.
Runs n_init trials of Lloyd's algorithm with k-means++ initialization, keeping the result with the lowest inertia.
Parameters:
- X : MatrixType Training data of shape (n_samples, n_features).
Returns:
- result : KMeans
Reference to the fitted estimator (
*this).
Throws:
std::invalid_argument— ifn_samples < n_clusters.
predict(X)
Predict the closest cluster each sample belongs to.
Parameters:
- X : MatrixType New data of shape (n_samples, n_features).
Returns:
- result : IndexVector Index of the closest cluster for each sample.
Throws:
std::runtime_error— if the model has not been fitted.
transform(X)
Transform X to a cluster-distance space.
Returns the Euclidean distance from each sample to each cluster center.
Parameters:
- X : MatrixType Data of shape (n_samples, n_features).
Returns:
- result : MatrixType Distance matrix of shape (n_samples, n_clusters).
Throws:
std::runtime_error— if the model has not been fitted.
Example
// KMeans
Skigen::KMeans<double> km(3, /*max_iter=*/300, /*n_init=*/10, /*random_state=*/42);
km.fit(X);
std::cout << "=== KMeans (k=3) ===\n";
std::cout << "Inertia: " << km.inertia() << "\n";
std::cout << "Iterations: " << km.n_iter() << "\n";
std::cout << "Centers:\n" << km.cluster_centers() << "\n\n";
// Predict on new points
Eigen::MatrixXd X_new(3, 2);
X_new << -4.0, 0.0,
4.0, 0.0,
0.0, 5.0;
auto labels = km.predict(X_new);
std::cout << "New point labels: " << labels.transpose() << "\n\n";