TruncatedSVD
Dimensionality reduction via truncated Singular Value Decomposition. Unlike PCA, TruncatedSVD does not center the data before decomposition, making it suitable for sparse matrices where centering would destroy sparsity.
Algorithm
Given an data matrix (not centered), compute the rank- SVD approximation:
The projection to components is:
The explained variance for component is the variance of the -th column of , and the explained variance ratio is its proportion of the total variance in .
Difference from PCA
| PCA | TruncatedSVD | |
|---|---|---|
| Centering | Centers data () | No centering |
| Sparse data | Destroys sparsity | Preserves sparsity |
| Use case | Dense data, general purpose | TF-IDF, count matrices, LSA |
TruncatedSVD applied to a TF-IDF matrix is known as Latent Semantic Analysis (LSA), a classic technique in information retrieval and topic modeling.
When to Use
- Text data: When working with sparse TF-IDF or bag-of-words representations.
- Non-negative or sparse data: When centering would be inappropriate or computationally expensive.
- For dense data where centering is desired, prefer PCA.
Mirrors sklearn.decomposition.TruncatedSVD.
Constructor
Skigen::TruncatedSVD<Scalar> svd(Eigen::Index n_components = 2);
| Parameter | Default | Description |
|---|---|---|
n_components | 2 | Number of components to keep |
Methods
| Method | Description |
|---|---|
fit(X) | Compute the truncated SVD of |
transform(X) | Project onto the components |
fit_transform(X) | Fit and project in one call |
inverse_transform(Z) | Approximate reconstruction: |
Fitted Attributes
| Accessor | Type | Description |
|---|---|---|
components() | MatrixType | Component directions (rows = components) |
explained_variance() | VectorType | Variance explained by each component |
explained_variance_ratio() | VectorType | Fraction of total variance per component |
singular_values() | VectorType | Singular values |
Example
#include <Skigen/Decomposition>
Skigen::TruncatedSVD svd(5);
auto X_reduced = svd.fit_transform(X); // Project to 5 dimensions
std::cout << "Explained variance ratio: "
<< svd.explained_variance_ratio().transpose() << "\n";