Principal Component Analysis With Faces

Mic at The Beginner Programmer shows us how to creepy PCA diagrams with human faces:

PCA looks for a new the reference system to describe your data. This new reference system is designed in such a way to maximize the variance of the data across the new axis. The first principal component accounts for as much variance as possible, as does the second and so on. PCA transforms a set of (tipically) correlated variables into a set of uncorrelated variables called principal components. By design, each principal component will account for as much variance as possible. The hope is that a fewer number of PCs can be used to summarise the whole dataset. Note that PCs are a linear combination of the original data.

The procedure simply boils down to the following steps

  1. Scale (normalize) the data (not necessary but suggested especially when variables are not homogeneous).

  2. Calculate the covariance matrix of the data.

  3. Calculate eigenvectors (also, perhaps confusingly, called “loadings”) and eigenvalues of the covariance matrix.

  4. Choose only the first N biggest eigenvalues according to one of the many criteria available in the literature.

  5. Project your data in the new frame of reference by multipliying your data matrix by a matrix whose columns are the N eigenvectors associated with the N biggest eigenvalues.

  6. Use the projected data (very confusingly called “scores”) as your new variables for further analysis.

I like the explanations provided, and the data set is definitely something I’m not used to seeing with PCA.  H/T R-bloggers

Related Posts

Calculating Lifetime Value With R

Sergey Bryl shows how to calculate the lifetime value of a subscription service: Predicting LTV is a common issue for a new, recently launched product/service/application when we don’t have a lot of historical data but want to calculate LTV as soon as possible. Even though we may have a lot of historical data on customer […]

Read More

Interpreting The Area Under The Receiver Operating Characteristic Curve

Roos Colman explains what a Receiver Operating Characteristic (ROC) curve is and how we interpret the Area Under the Curve (AUC): The AUC can be defined as “The probability that a randomly selected case will have a higher test result than a randomly selected control”. Let’s use this definition to calculate and visualize the estimated […]

Read More


August 2018
« Jul Sep »