Skip to main content

UMAP

#include <Skigen/Manifold>

template <typename Scalar = double>
class Skigen::UMAP(n_components=2, n_neighbors=15, min_dist=0.1, learning_rate=1, n_epochs=200, negative_sample_rate=5, random_state=std::nullopt)

Uniform Manifold Approximation and Projection (UMAP).

Non-linear dimensionality reduction that preserves both local and global structure. Constructs a weighted KNN graph in high-dimensional space, then optimises a low-dimensional layout via stochastic gradient descent on a cross-entropy objective.

Mirrors umap-learn.


Parameters:

  • n_components : int, default=2 Embedding dimension (default 2).

  • n_neighbors : int, default=15 Local neighborhood size (default 15).

  • min_dist : Scalar, default=0.1 Minimum distance in the embedding (default 0.1).

  • learning_rate : Scalar, default=1 Initial SGD learning rate (default 1.0).

  • n_epochs : int, default=200 Number of optimisation epochs (default 200).

  • negative_sample_rate : int, default=5 Negative samples per positive edge (default 5).

  • random_state : std::optional< uint64_t >, default=std::nullopt Optional RNG seed (default nullopt).


Attributes:

  • embedding : MatrixType Low-dimensional embedding (n_samples x n_components).

Methods

SKIGEN_PARAMS()

Fit the UMAP model to training data X.

Builds the fuzzy KNN graph, computes membership strengths, then runs SGD to optimise the low-dimensional layout.

Parameters:

  • X Training data of shape (n_samples, n_features).

Returns:

  • result Reference to the fitted transformer (*this).

transform()

Return the stored embedding for the training data.

Parameters:

  • X Data matrix of shape (n_samples, n_features).

Returns:

  • result : MatrixType Embedding of shape (n_samples, n_components).

Example

Plotting

The figure below is rendered from a registered SkigenPlot-enabled example during the documentation build.

Source example: examples/manifold/umap.cpp

UMAP embeddingUMAP embedding
Eigen::VectorXi labels(n);
for (int cluster = 0; cluster < 3; ++cluster) {
for (int sample = 0; sample < n_per; ++sample) {
labels(cluster * n_per + sample) = cluster;
}
}

Skigen::Plot::Figure fig;
fig.title("UMAP Embedding")
.caption("Three 3-D Gaussian clusters embedded into 2-D by Skigen::UMAP")
.xlabel("UMAP 1")
.ylabel("UMAP 2")
.scatter(Y, labels);

return argc > 1 ? (fig.saveThemed(argv[1]) ? 0 : 1) : fig.show();