PCA

Principal Component Analysis finds orthogonal directions of maximum variance in the data, enabling dimensionality reduction while retaining as much information as possible.

Algorithm

Given an $n \times p$ data matrix $X$ :

Center the data: $X_c = X - \mathbf{1}\bar{x}^\top$ , where $\bar{x}$ is the column-wise mean.
Compute the SVD of $X_c$ : $X_c = U \Sigma V^\top$ .
Truncate to $k$ components by keeping only the first $k$ columns of $V$ (right singular vectors).

The projection onto the $k$ -dimensional subspace is:

Z = X_c \, V_k \in \mathbb{R}^{n \times k}

Explained Variance

The explained variance of the $j$ -th component is derived from the singular values:

\text{explained\_variance}_j = \frac{\sigma_j^2}{n - 1}

using $n-1$ degrees of freedom (Bessel's correction), consistent with scikit-learn. The explained variance ratio measures the proportion of total variance captured by each component:

\text{explained\_variance\_ratio}_j = \frac{\sigma_j^2}{\sum_{i=1}^{p} \sigma_i^2}

Key Properties

PCA always centers the data before decomposition. For data that should not be centered (e.g., sparse matrices, TF-IDF), use TruncatedSVD instead.
The components are ordered by decreasing explained variance.
inverse_transform reconstructs an approximation: $\hat{X} = Z V_k^\top + \mathbf{1}\bar{x}^\top$ .
Skigen uses Eigen::JacobiSVD for full SVD.

Mirrors sklearn.decomposition.PCA.

Constructor

Skigen::PCA<Scalar> pca(Eigen::Index n_components = 0);

Parameter	Default	Description
`n_components`	`0`	Number of components to keep ( $0$ = all)

Methods

Method	Description
`fit(X)`	Compute the SVD of the centered data
`transform(X)`	Project $X$ onto the principal components
`fit_transform(X)`	Fit and project in one call
`inverse_transform(Z)`	Reconstruct from the reduced representation

Fitted Attributes

Accessor	Type	Description
`components()`	`MatrixType`	Principal axes (rows = components)
`explained_variance()`	`VectorType`	Variance explained by each component ( $\sigma_j^2 / (n-1)$ )
`explained_variance_ratio()`	`VectorType`	Fraction of total variance per component
`singular_values()`	`VectorType`	Singular values $\sigma_1 \ge \sigma_2 \ge \cdots$
`mean()`	`RowVectorType`	Per-feature mean $\bar{x}$

Example

#include <Skigen/Decomposition>
#include <Eigen/Dense>
#include <iostream>

int main() {
    Eigen::MatrixXd X = Eigen::MatrixXd::Random(100, 10);

    Skigen::PCA pca(3);  // keep 3 components
    pca.fit(X);

    Eigen::MatrixXd X_reduced = pca.transform(X);  // 100 x 3
    std::cout << "Explained variance ratio: "
              << pca.explained_variance_ratio().transpose() << "\n";

    // Approximate reconstruction
    Eigen::MatrixXd X_approx = pca.inverse_transform(X_reduced);
}

Algorithm​

Explained Variance​

Key Properties​

Constructor​

Methods​

Fitted Attributes​

Example​