Skip to main content

PCA

Principal Component Analysis finds orthogonal directions of maximum variance in the data, enabling dimensionality reduction while retaining as much information as possible.

Algorithm

Given an n×pn \times p data matrix XX:

  1. Center the data: Xc=X1xˉX_c = X - \mathbf{1}\bar{x}^\top, where xˉ\bar{x} is the column-wise mean.
  2. Compute the SVD of XcX_c: Xc=UΣVX_c = U \Sigma V^\top.
  3. Truncate to kk components by keeping only the first kk columns of VV (right singular vectors).

The projection onto the kk-dimensional subspace is:

Z=XcVkRn×kZ = X_c \, V_k \in \mathbb{R}^{n \times k}

Explained Variance

The explained variance of the jj-th component is derived from the singular values:

explained_variancej=σj2n1\text{explained\_variance}_j = \frac{\sigma_j^2}{n - 1}

using n1n-1 degrees of freedom (Bessel's correction), consistent with scikit-learn. The explained variance ratio measures the proportion of total variance captured by each component:

explained_variance_ratioj=σj2i=1pσi2\text{explained\_variance\_ratio}_j = \frac{\sigma_j^2}{\sum_{i=1}^{p} \sigma_i^2}

Key Properties

  • PCA always centers the data before decomposition. For data that should not be centered (e.g., sparse matrices, TF-IDF), use TruncatedSVD instead.
  • The components are ordered by decreasing explained variance.
  • inverse_transform reconstructs an approximation: X^=ZVk+1xˉ\hat{X} = Z V_k^\top + \mathbf{1}\bar{x}^\top.
  • Skigen uses Eigen::JacobiSVD for full SVD.

Mirrors sklearn.decomposition.PCA.

Constructor

Skigen::PCA<Scalar> pca(Eigen::Index n_components = 0);
ParameterDefaultDescription
n_components0Number of components to keep (00 = all)

Methods

MethodDescription
fit(X)Compute the SVD of the centered data
transform(X)Project XX onto the principal components
fit_transform(X)Fit and project in one call
inverse_transform(Z)Reconstruct from the reduced representation

Fitted Attributes

AccessorTypeDescription
components()MatrixTypePrincipal axes (rows = components)
explained_variance()VectorTypeVariance explained by each component (σj2/(n1)\sigma_j^2 / (n-1))
explained_variance_ratio()VectorTypeFraction of total variance per component
singular_values()VectorTypeSingular values σ1σ2\sigma_1 \ge \sigma_2 \ge \cdots
mean()RowVectorTypePer-feature mean xˉ\bar{x}

Example

#include <Skigen/Decomposition>
#include <Eigen/Dense>
#include <iostream>

int main() {
Eigen::MatrixXd X = Eigen::MatrixXd::Random(100, 10);

Skigen::PCA pca(3); // keep 3 components
pca.fit(X);

Eigen::MatrixXd X_reduced = pca.transform(X); // 100 x 3
std::cout << "Explained variance ratio: "
<< pca.explained_variance_ratio().transpose() << "\n";

// Approximate reconstruction
Eigen::MatrixXd X_approx = pca.inverse_transform(X_reduced);
}