Scalable Data Analytics

Kevin Feasel

2017-04-03

Cloud, R

David Smith covers a recent Microsoft Data Science team talk at Strata:

The tutorial covers many different techniques for training predictive models at scale, and deploying the trained models as predictive engines within production environments. Among the technologies you’ll use are Microsoft R Server running on Spark, the SparkR package, the sparklyr package and H20 (via the rsparkling package). It also touches on some non-Spark methods, like the bigmemory and ff packages for R (and various other packages that make use of them), and using the foreach package for coarse-grained parallel computations. You’ll also learn how to create prediction engines from these trained models using the mrsdeploy package.

Check out the post as well as the tutorial David links.

Related Posts

Regular Expression Cheat Sheets

Mara Averick shows off a collection of regular expression guides: There are helpful string-related R packages 📦, stringr (which is built on top of the more comprehensive stringi package) comes to mind. But, at some point in your computing life, you’re gonna need to get down with regular expressions. And so, here’s a collection of some of the Regex-related […]

Read More

Visualizing A Single Number

Tim Bock shows a dozen methods for visualizing a single number: There are a number of situations in which it can be advantageous to create a visualization to represent a single number: To communicate with less numerate viewers/readers; Infographics and dashboards commonly use one important number; To attract the attention of distracted or busy viewers/readers; […]

Read More

Categories

April 2017
MTWTFSS
« Mar May »
 12
3456789
10111213141516
17181920212223
24252627282930