Scalable Data Analytics

Kevin Feasel


Cloud, R

David Smith covers a recent Microsoft Data Science team talk at Strata:

The tutorial covers many different techniques for training predictive models at scale, and deploying the trained models as predictive engines within production environments. Among the technologies you’ll use are Microsoft R Server running on Spark, the SparkR package, the sparklyr package and H20 (via the rsparkling package). It also touches on some non-Spark methods, like the bigmemory and ff packages for R (and various other packages that make use of them), and using the foreach package for coarse-grained parallel computations. You’ll also learn how to create prediction engines from these trained models using the mrsdeploy package.

Check out the post as well as the tutorial David links.

Related Posts

Running H2O In R On Azure HDInsight

Daisy Deng shows how to configure HDInsight to be able to run the H2O package in R rather than Python or Scala: We provide a few script actions for installing rsparkling on Azure HDInsight. When creating the HDInsight cluster, you can run the following script action for header node: And run the following action […]

Read More

Bacpacing In Azure

Derik Hammer shows how to use a bacpac file to deploy an existing database to Azure SQL Database: The recommended method for working with Azure is always PowerShell. The Azure portal and SSMS are tools there for your convenience but they do not scale well. If you have multiple databases to migrate, potentially from multiple […]

Read More


April 2017
« Mar May »