Skip to main content

TSNE

t-distributed Stochastic Neighbor Embedding: a popular non-linear method for visualising high-dimensional data in 2-D/3-D.

The examples/manifold/tsne.cpp program embeds three 4-D Gaussian clusters into 2-D and renders them with SkigenPlot:

Three 4-D Gaussian clusters embedded into 2-D by exact Skigen::TSNEThree 4-D Gaussian clusters embedded into 2-D by exact Skigen::TSNE

Algorithm

Models pairwise similarities as conditional probabilities in both spaces and minimises their KL divergence by gradient descent, using a heavy-tailed Student-t kernel in the embedding to avoid crowding.

Two optimisation methods are available:

  • "exact" — the dense O(n2)O(n^2) gradient. Most accurate, suitable for small datasets.
  • "barnes_hut" (default) — for 2-D embeddings, the repulsive forces and the Student-t normalisation constant are approximated with a quadtree (van der Maaten, JMLR 2014), reducing the per-iteration cost to O(nlogn)O(n \log n). The angle parameter trades accuracy for speed: a cell is summarised by its centre of mass when (cell width/distance)<angle(\text{cell width} / \text{distance}) < \text{angle}. For n_components != 2 the estimator automatically falls back to the exact method.

Constructor

Skigen::TSNE<Scalar> model(
int n_components = 2,
Scalar perplexity = 30.0,
Scalar learning_rate = 200.0,
int n_iter = 1000,
std::string method = "barnes_hut",
Scalar angle = 0.5,
Scalar early_exaggeration = 12.0,
std::optional<uint64_t> random_state = std::nullopt);

Parameters

ParameterDefaultDescription
n_components2Embedding dimensionality.
perplexity30.0Effective neighbourhood size.
learning_rate200.0Gradient-descent step size.
n_iter1000Optimisation iterations.
method"barnes_hut""barnes_hut" (O(n log n), 2-D) or "exact" (O(n²)).
angle0.5Barnes-Hut accuracy/speed trade-off.
early_exaggeration12.0P-scaling during the first 250 iterations.

Methods

MethodDescription
fit_transform(X)Return the embedding.

Fitted Attributes

AccessorDescription
kl_divergence()Final KL divergence.
method()Method actually used ("barnes_hut" or "exact").

Example

Skigen::TSNE<double> tsne(2, 30.0);
auto Y = tsne.fit_transform(X);
Verified against scikit-learn

This estimator is checked by the parity suite. See the generator tests/parity/generate_manifold_reference.py and the reference fixtures in tests/parity/data/tsne/, exercised by tests/parity/parity_manifold.cpp.

API Reference

For full signatures see the TSNE API Reference.