MiniBatchKMeans
#include <Skigen/Cluster>
template <typename Scalar = double>
class Skigen::MiniBatchKMeans(n_clusters=8, batch_size=100, max_iter=100, random_state=42)
Mini-Batch K-Means clustering.
Alternative online implementation of KMeans that uses mini-batches to reduce the computation time, while still attempting to optimise the same objective function.
Mirrors sklearn.cluster.MiniBatchKMeans.
Parameters:
-
n_clusters : int, default=8 The number of clusters (
int, default8). -
batch_size : int, default=100 Size of the mini batches (
int, default100). -
max_iter : int, default=100 Maximum iterations (
int, default100). -
random_state : unsigned int, default=42 RNG seed (
unsigned int, default42).
Attributes:
-
is_fitted : bool Whether the estimator has been fitted.
-
n_clusters : int The number of clusters.
-
cluster_centers : MatrixType Cluster centers (n_clusters × n_features).
-
labels : IndexVector Labels of each training point.
-
inertia : Scalar Sum of squared distances to closest cluster center.
Methods
fit(X)
Fit the MiniBatchKMeans model.
Uses k-means++ initialization on the first batch, then performs mini-batch stochastic updates to cluster centers.
Parameters:
- X : MatrixType Training data of shape (n_samples, n_features).
Returns:
- result : MiniBatchKMeans
Reference to the fitted estimator (
*this).
Throws:
std::invalid_argument— ifn_samples < n_clusters.
predict(X)
Predict the closest cluster each sample belongs to.
Parameters:
- X : MatrixType New data of shape (n_samples, n_features).
Returns:
- result : IndexVector Index of the closest cluster for each sample.
Throws:
std::runtime_error— if the model has not been fitted.
Example
// MiniBatchKMeans — faster for large datasets
Skigen::MiniBatchKMeans<double> mbk(3, /*batch_size=*/30, /*max_iter=*/100, /*random_state=*/42);
mbk.fit(X);
std::cout << "=== MiniBatchKMeans (k=3, batch=30) ===\n";
std::cout << "Inertia: " << mbk.inertia() << "\n";
std::cout << "Centers:\n" << mbk.cluster_centers() << "\n";