Data Set Robustness

Tomaz Kastrun shows how robust the iris data set is:

Conclusion, IRIS dataset is – due to the nature of the measurments and observations – robust and rigid; one can get very good accuracy results on a small training set. Everything beyond 30% for training the model, is for this particular case, just additional overload.

The general concept here is, how small can you arbitrarily slice the data and still come up with the same result as the overall data set?  Or, phrased differently, how much data do you need to collect before predictions stabilize?  Read on to see how Tomaz solves the problem.

Related Posts

Using cdata To Created Faceted Plots

Nina Zumel shows how to use the cdata package to create faceted ggplot2 plots: First, load the packages and data: library("ggplot2") library("cdata") iris <- data.frame(iris) Now define the data-shaping transform, or control table. The control table is basically a picture that sketches out the final data shape that I want. I want to specify the x and y columns of the plot […]

Read More

Comparing TensorFlow Versus PyTorch

Anirudh Rao compares PyTorch to TensorFlow: For small-scale server-side deployments both frameworks are easy to wrap in e.g. a Flask web server. For mobile and embedded deployments, TensorFlow works really well. This is more than what can be said of most other deep learning frameworks including PyTorch. Deploying to Android or iOS does require a non-trivial amount of work in TensorFlow. You don’t have to rewrite the entire inference portion of your model in Java or C++. […]

Read More


October 2017
« Sep Nov »