Scalable Data Analytics

Kevin Feasel


Cloud, R

David Smith covers a recent Microsoft Data Science team talk at Strata:

The tutorial covers many different techniques for training predictive models at scale, and deploying the trained models as predictive engines within production environments. Among the technologies you’ll use are Microsoft R Server running on Spark, the SparkR package, the sparklyr package and H20 (via the rsparkling package). It also touches on some non-Spark methods, like the bigmemory and ff packages for R (and various other packages that make use of them), and using the foreach package for coarse-grained parallel computations. You’ll also learn how to create prediction engines from these trained models using the mrsdeploy package.

Check out the post as well as the tutorial David links.

Related Posts

Loops Versus Apply: Speed Comparison

Mike Spencer compares lapply (single core and its multi-core version) versus a for loop in R: But how fast were they? Can we get faster? Thankfully R provides `system.time()` for timing code execution. In order to get faster, it makes sense to use all the processing power our machines have. The ‘parallel’ library has some […]

Read More

Quoted Concatenation In R

John Mount has a quick tip for R users: Here is an R tip. Need to quote a lot of names at once? Use qc(). This function is part of wrapr.

Read More


April 2017
« Mar May »