John Mount introduces vtreat, an R package for data preparation:
Our group is distributing a detailed write up of the theory and operation behind our R realization of a set of sound data preparation and cleaning procedures called vtreat here: arXiv:1611.09477 [stat.AP]. This is where you can find out what
vtreat
does, decide if it is appropriate for your problem, or even find a specification allowing the use of the techniques in non-R
environments (such asPython
/Pandas
/scikit-learn
,Spark
, and many others).We have submitted this article for formal publication, so it is our intent you can cite this article (as it stands) in scientific work as a pre-print, and later cite it from a formally refereed source.
Or alternately, below is the tl;dr (“too long; didn’t read”) form.
Read more about vtreat on the package page or the vtreat vignette.