PCA
Principal Component Analysis finds orthogonal directions of maximum variance in the data, enabling dimensionality reduction while retaining as much information as possible.
Algorithm
Given an data matrix :
- Center the data: , where is the column-wise mean.
- Compute the SVD of : .
- Truncate to components by keeping only the first columns of (right singular vectors).
The projection onto the -dimensional subspace is:
Explained Variance
The explained variance of the -th component is derived from the singular values:
using degrees of freedom (Bessel's correction), consistent with scikit-learn. The explained variance ratio measures the proportion of total variance captured by each component:
Key Properties
- PCA always centers the data before decomposition. For data that should not be centered (e.g., sparse matrices, TF-IDF), use TruncatedSVD instead.
- The components are ordered by decreasing explained variance.
inverse_transformreconstructs an approximation: .- Skigen uses
Eigen::JacobiSVDfor full SVD.
Mirrors sklearn.decomposition.PCA.
Constructor
Skigen::PCA<Scalar> pca(Eigen::Index n_components = 0);
| Parameter | Default | Description |
|---|---|---|
n_components | 0 | Number of components to keep ( = all) |
Methods
| Method | Description |
|---|---|
fit(X) | Compute the SVD of the centered data |
transform(X) | Project onto the principal components |
fit_transform(X) | Fit and project in one call |
inverse_transform(Z) | Reconstruct from the reduced representation |
Fitted Attributes
| Accessor | Type | Description |
|---|---|---|
components() | MatrixType | Principal axes (rows = components) |
explained_variance() | VectorType | Variance explained by each component () |
explained_variance_ratio() | VectorType | Fraction of total variance per component |
singular_values() | VectorType | Singular values |
mean() | RowVectorType | Per-feature mean |
Example
#include <Skigen/Decomposition>
#include <Eigen/Dense>
#include <iostream>
int main() {
Eigen::MatrixXd X = Eigen::MatrixXd::Random(100, 10);
Skigen::PCA pca(3); // keep 3 components
pca.fit(X);
Eigen::MatrixXd X_reduced = pca.transform(X); // 100 x 3
std::cout << "Explained variance ratio: "
<< pca.explained_variance_ratio().transpose() << "\n";
// Approximate reconstruction
Eigen::MatrixXd X_approx = pca.inverse_transform(X_reduced);
}