Mic at The Beginner Programmer shows us how to creepy PCA diagrams with human faces:
PCA looks for a new the reference system to describe your data. This new reference system is designed in such a way to maximize the variance of the data across the new axis. The first principal component accounts for as much variance as possible, as does the second and so on. PCA transforms a set of (tipically) correlated variables into a set of uncorrelated variables called principal components. By design, each principal component will account for as much variance as possible. The hope is that a fewer number of PCs can be used to summarise the whole dataset. Note that PCs are a linear combination of the original data.
The procedure simply boils down to the following steps
-
Scale (normalize) the data (not necessary but suggested especially when variables are not homogeneous).
-
Calculate the covariance matrix of the data.
-
Calculate eigenvectors (also, perhaps confusingly, called “loadings”) and eigenvalues of the covariance matrix.
-
Choose only the first N biggest eigenvalues according to one of the many criteria available in the literature.
-
Project your data in the new frame of reference by multipliying your data matrix by a matrix whose columns are the N eigenvectors associated with the N biggest eigenvalues.
-
Use the projected data (very confusingly called “scores”) as your new variables for further analysis.
I like the explanations provided, and the data set is definitely something I’m not used to seeing with PCA. H/T R-bloggers