The Basics Of PCA In R

Prashant Shekhar gives us an overview of Principal Component Analysis using R:

PCA changes the axis towards the direction of maximum variance and then takes projection on this new axis. The direction of maximum variance is represented by Principal Components (PC1). There are multiple principal components depending on the number of dimensions (features) in the dataset and they are orthogonal to each other. The maximum number of principal component is same as a number of dimension of data. For example, in the above figure, for two-dimension data, there will be max of two principal components (PC1 & PC2). The first principal component defines the most of the variance, followed by second principal component, third principal component and so on. Dimension reduction comes from the fact that it is possible to discard last few principal components as they will not capture much variance in the data.

PCA is a useful technique for reducing dimensionality and removing covariance.

Related Posts

Wasting Money With Data Science

Giovanni Lanzani has a post with the controversial title above: Some data is gathered, given to data scientists, and — after two weeks — the first demo takes place. The results are promising, but they need a bit more time. Fine. After all, the data was messy: they had to clean it up and go […]

Read More

Be Careful Of P-Hacking

Vincent Granville discusses the problem of p-hacking: I read an article this morning, about a top Cornell food researcher having 13 studies retracted, see here. It prompted me to write this blog. It is about data science charlatans and unethical researchers in the Academia, destroying the value of p-values again, using a well known trick called p-hacking, to get […]

Read More


February 2018
« Jan Mar »