Sparklyr On HDInsight

Kevin Feasel


Cloud, Hadoop, R

Ali Zaidi has a walkthrough on using sparklyr on HDInsight:

The majority of Spark is written in Scala (~80% of Spark core), which is a functional programming language. Functional programming languages emphasize functional purity (the output only depends on the inputs) and strive to avoid side-effects. One important component of most functional programming languages is their lazy evaluation. While it might seem odd that we would appreciate laziness from our computing tools, lazy evaluation is an effective way of ensuring computations are evaluated in the most efficient manner possible.

Lazy evaluation allows Spark SQL to highly optimize the queries. When a user submits a query to Spark SQL, Spark composes the components of the SQL query into a logical plan. The logical plan is basically a recipe Spark SQL creates in order to evaluate the desired query. Spark SQL then submits the logical plan to its highly optimized engine called Catalyst, which optimizes this plan into a physical plan of action that is executed inside Spark computation engine (a series of coordinating JVMs).

Read on for more description and code.

Related Posts

Hyperparameter Tuning with MLflow

Joseph Bradley shows how you can perform hyperparameter tuning of an MLlib model with MLflow: Apache Spark MLlib users often tune hyperparameters using MLlib’s built-in tools CrossValidator and TrainValidationSplit.  These use grid search to try out a user-specified set of hyperparameter values; see the Spark docs on tuning for more info. Databricks Runtime 5.3 and 5.3 ML and above support […]

Read More

Predicting Intermittent Demand

Bruno Rodrigues shows one technique for forecasting intermittent data: Now, it is clear that this will be tricky to forecast. There is no discernible pattern, no trend, no seasonality… nothing that would make it “easy” for a model to learn how to forecast such data. This is typical intermittent demand data. Specific methods have been […]

Read More


November 2016
« Oct Dec »