Principal Component Analysis Using R

Kevin Feasel



Nina Zumel delves into principal component regression using R (via R Bloggers):

Data tends to come from databases that must support many different tasks, so it is exactly the case that there may be columns or variables that are correlated to unknown and unwanted additional processes. The reason PCA can’t filter out these noise variables is that without use of y, standard PCA has no way of knowing what portion of the variation in each variable is important to the problem at hand and should be preserved. This can be fixed through domain knowledge (knowing which variables to use), variable pruning and y-aware scaling. Our next article will discuss these procedures; in this article we will orient ourselves with a demonstration of both what a good analysis and what a bad analysis looks like.

All the variables are also deliberately mis-scaled to model some of the difficulties of working with under-curated real world data.

This does read like an academic paper, so it’s pretty heavy reading.  It’s also very good reading from a great writer, so take some time and give it a read if you do data analysis.

Related Posts


Nina Zumel announces a new version of WVPlots on CRAN: WVPlots was originally a catch-all package of ggplot2 visualizations that we at Win-Vector tended to use repeatedly, and wanted to turn into “one-liners.” A consequence of this is that the older visualizations had our preferred color schemes hard-coded in. More recent additions to the package sometimes had palette […]

Read More

Icon Maps in R

Laura Ellis shows how you can build maps full of little icons: That was ok, but we should try to make the images more aesthetically pleasing using the magick package. We make each image transparent with the image_transparent() function. We can also make the resulting image a specific color with image_colorize(). I then saved the […]

Read More


May 2016
« Apr Jun »