Scalable Data Analytics

Kevin Feasel


Cloud, R

David Smith covers a recent Microsoft Data Science team talk at Strata:

The tutorial covers many different techniques for training predictive models at scale, and deploying the trained models as predictive engines within production environments. Among the technologies you’ll use are Microsoft R Server running on Spark, the SparkR package, the sparklyr package and H20 (via the rsparkling package). It also touches on some non-Spark methods, like the bigmemory and ff packages for R (and various other packages that make use of them), and using the foreach package for coarse-grained parallel computations. You’ll also learn how to create prediction engines from these trained models using the mrsdeploy package.

Check out the post as well as the tutorial David links.

Related Posts

Using wrapr For A Consistent Pipe With ggplot2

John Mount shows how you can use the wrapr pipe to perform data processing and building a ggplot2 visual: Now we can run a single pipeline that combines data processing steps and ggplot plot construction. data.frame(x = 1:20) %.>% mutate(., y = cos(3*x)) %.>% ggplot(., aes(x = x, y = y)) %.>% geom_point() %.>% geom_line() %.>% ggtitle("piped ggplot2") Check […]

Read More

Using R To Hit Azure ML From Power BI

Leila Etaati shows how you can use R to hit an Azure ML endpoint to populate a data set in Power BI: You need to create a model in Azure ML Studio and create a web service for it. The traditional example in Predict a passenger on Titanic ship is going to survived or not? […]

Read More


April 2017
« Mar May »