Skip to main content

TruncatedSVD

Dimensionality reduction via truncated Singular Value Decomposition. Unlike PCA, TruncatedSVD does not center the data before decomposition, making it suitable for sparse matrices where centering would destroy sparsity.

Algorithm

Given an n×pn \times p data matrix XX (not centered), compute the rank-kk SVD approximation:

XUkΣkVkX \approx U_k \Sigma_k V_k^\top

The projection to kk components is:

Z=XVkRn×kZ = X V_k \in \mathbb{R}^{n \times k}

The explained variance for component jj is the variance of the jj-th column of ZZ, and the explained variance ratio is its proportion of the total variance in XX.

Difference from PCA

PCATruncatedSVD
CenteringCenters data (XxˉX - \bar{x})No centering
Sparse dataDestroys sparsityPreserves sparsity
Use caseDense data, general purposeTF-IDF, count matrices, LSA

TruncatedSVD applied to a TF-IDF matrix is known as Latent Semantic Analysis (LSA), a classic technique in information retrieval and topic modeling.

When to Use

  • Text data: When working with sparse TF-IDF or bag-of-words representations.
  • Non-negative or sparse data: When centering would be inappropriate or computationally expensive.
  • For dense data where centering is desired, prefer PCA.

Mirrors sklearn.decomposition.TruncatedSVD.

Constructor

Skigen::TruncatedSVD<Scalar> svd(Eigen::Index n_components = 2);
ParameterDefaultDescription
n_components2Number of components to keep

Methods

MethodDescription
fit(X)Compute the truncated SVD of XX
transform(X)Project XX onto the kk components
fit_transform(X)Fit and project in one call
inverse_transform(Z)Approximate reconstruction: X^=ZVk\hat{X} = Z V_k^\top

Fitted Attributes

AccessorTypeDescription
components()MatrixTypeComponent directions (rows = components)
explained_variance()VectorTypeVariance explained by each component
explained_variance_ratio()VectorTypeFraction of total variance per component
singular_values()VectorTypeSingular values σ1σ2\sigma_1 \ge \sigma_2 \ge \cdots

Example

#include <Skigen/Decomposition>

Skigen::TruncatedSVD svd(5);
auto X_reduced = svd.fit_transform(X); // Project to 5 dimensions
std::cout << "Explained variance ratio: "
<< svd.explained_variance_ratio().transpose() << "\n";