Scalable Data Analytics

David Smith covers a recent Microsoft Data Science team talk at Strata:

The tutorial covers many different techniques for training predictive models at scale, and deploying the trained models as predictive engines within production environments. Among the technologies you’ll use are Microsoft R Server running on Spark, the SparkR package, the sparklyr package and H20 (via the rsparkling package). It also touches on some non-Spark methods, like the bigmemory and ff packages for R (and various other packages that make use of them), and using the foreach package for coarse-grained parallel computations. You’ll also learn how to create prediction engines from these trained models using the mrsdeploy package.

Check out the post as well as the tutorial David links.

Related Posts

Logistic Regression In R

Steph Locke has a presentation on performing logistic regression using R: Logistic regressions are a great tool for predicting outcomes that are categorical. They use a transformation function based on probability to perform a linear regression. This makes them easy to interpret and implement in other systems. Logistic regressions can be used to perform a classification […]

Read More

Feature Improvements In Microsoft R Server 9.1

David Smith gives us a nice roundup of feature improvements in Microsoft R Server 9.1: Interoperability between Microsoft R Server and sparklyr. You can now use RStudio’s sparklyr package in tandem with Microsoft R Server in a single Spark session New machine learning models in Hadoop and Spark. The new machine learning functions introduced with Version 9.0 […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

April 2017
MTWTFSS
« Mar  
 12
3456789
10111213141516
17181920212223
24252627282930