Data Set Robustness

Tomaz Kastrun shows how robust the iris data set is:

Conclusion, IRIS dataset is – due to the nature of the measurments and observations – robust and rigid; one can get very good accuracy results on a small training set. Everything beyond 30% for training the model, is for this particular case, just additional overload.

The general concept here is, how small can you arbitrarily slice the data and still come up with the same result as the overall data set?  Or, phrased differently, how much data do you need to collect before predictions stabilize?  Read on to see how Tomaz solves the problem.

Related Posts

The Basics Of PCA In R

Prashant Shekhar gives us an overview of Principal Component Analysis using R: PCA changes the axis towards the direction of maximum variance and then takes projection on this new axis. The direction of maximum variance is represented by Principal Components (PC1). There are multiple principal components depending on the number of dimensions (features) in the […]

Read More

Tidy Data Is Normalized Data

I emphasize the link between a tidy dataframe and a normalized data structure: The kicker, as Wickham describes on pages 4-5, is that normalization is a critical part of tidying data.  Specifically, Wickham argues that tidy data should achieve third normal form. Now, in practice, Wickham argues, we tend to need to denormalize data because […]

Read More


October 2017
« Sep Nov »