Principal components analysis

In classical statistics, Principal Component Analysis (PCA) is a technique used to represent multidimensional data in a more convenient basis. The goal is to reduce dimensionality while preserving as much variance (information) as possible.

Example: Football Players' Attributes
Imagine we have a dataset consisting of the measured features, with different tests, of 100 football players:

D=[speed valuesstrength values]

It's possible that some features are redundant. For example, speed and strength might be highly correlated — meaning they carry overlapping information. Maybe a linear combination of them can summarize both features. In such cases, we might be able to reduce the dimensionality of the dataset by combining them or even dropping one.

Covariance Matrix
To detect redundancy and correlations, we use the covariance matrix. For two variables x and y, the covariance matrix is defined as:

M=Cov(x,y)=[σ(x,x)σ(x,y)σ(y,x)σ(y,y)]

where

σ(x,y)=E[(xE(x))(yE(y))].

If the data is stored in the matrix D, then M=DD. This has to do with dot product#Dot product as correlation.
Pasted image 20250617163006.png

Whitening and Linear Transformation
Suppose the data is stored in a matrix D, and that this points cloud have the shape of an ellipsoid. Consider, also that the data originates from white data D, meaning the data is uncorrelated and has unit variance. Then we have:

M=DD=I

and we have

D=TD,

for some linear transformation T. Consider the singular value decomposition of T:

T=R2SR1

where:

M=DD=(R2SR1D)(R2SR1D)=R2SR1DDR1SR2

Since DD=I, this simplifies to:

M=R2SSR2

Then, the covariance matrix is always diagonalizable (this was trivial, since it is symmetric). Its eigenvectors are the columns of R2, and according to T=R2SR1, they are the axis of the ellipsoid.

This eigenvectors are linear combination of the features, that is, new features, which let us analyze the data more easily. For example, instead of strength and speed, we have "strenspeed" and "noise". The first one would have a big eigenvalue, and the second one a small one. If we undo the rotation R2 we are expressing the data D into this new basis:

D=R21D.

Pasted image 20250617163054.png
Since the "Strenspeed" axis captures most of the information, and the "Noise" axis captures very little, we can often discard the "Noise" axis without losing much important information. By doing this, we reduce the dimensionality of our dataset from 2D to 1D. Our football players are now primarily described by a single, powerful feature: their "Strenspeed". This makes analysis, visualization, and machine learning models much simpler and more efficient.