Our group is distributing a detailed write up of the theory and operation behind our R realization of a set of sound data preparation and cleaning procedures called vtreat here: arXiv:1611.09477 [stat.AP]. This is where you can find out what
vtreatdoes, decide if it is appropriate for your problem, or even find a specification allowing the use of the techniques in non-
Renvironments (such as
Spark, and many others).
We have submitted this article for formal publication, so it is our intent you can cite this article (as it stands) in scientific work as a pre-print, and later cite it from a formally refereed source.
Or alternately, below is the tl;dr (“too long; didn’t read”) form.