Principal Component Analysis Using R

Kevin Feasel



Nina Zumel delves into principal component regression using R (via R Bloggers):

Data tends to come from databases that must support many different tasks, so it is exactly the case that there may be columns or variables that are correlated to unknown and unwanted additional processes. The reason PCA can’t filter out these noise variables is that without use of y, standard PCA has no way of knowing what portion of the variation in each variable is important to the problem at hand and should be preserved. This can be fixed through domain knowledge (knowing which variables to use), variable pruning and y-aware scaling. Our next article will discuss these procedures; in this article we will orient ourselves with a demonstration of both what a good analysis and what a bad analysis looks like.

All the variables are also deliberately mis-scaled to model some of the difficulties of working with under-curated real world data.

This does read like an academic paper, so it’s pretty heavy reading.  It’s also very good reading from a great writer, so take some time and give it a read if you do data analysis.

Related Posts

Naive Bays in R

Zulaikha Lateef takes us through the Naive Bayes algorithm and implementations in R: Naive Bayes is a Supervised Machine Learning algorithm based on the Bayes Theorem that is used to solve classification problems by following a probabilistic approach. It is based on the idea that the predictor variables in a Machine Learning model are independent of […]

Read More

Exporting Data from Power Query with R

Leila Etaati shows how you can use R to export data from Power Query to disk or to SQL Server: There is always a discussion on how to store back the data from Power BI to local computer or SQL Server Databases, in this short blog, I will show how to do it by writing […]

Read More


May 2016
« Apr Jun »