sparklyr 0.6 Released

Kevin Feasel

2017-08-01

R, Spark

Javier Luraschi announces sparklyr 0.6:

We’re excited to announce a new release of the sparklyr package, available in CRAN today! sparklyr 0.6 introduces new features to:

  • Distribute R computations using spark_apply() to execute arbitrary R code across your Spark cluster. You can now use all of your favorite R packages and functions in a distributed context.

  • Connect to External Data Sources using spark_read_source()spark_write_source()spark_read_jdbc() and spark_write_jdbc().

  • Use the Latest Frameworks including dplyr 0.7DBI 0.7RStudio 1.1and Spark 2.2.

I’ve been impressed with sparklyr so far.

Related Posts

Housing Prices In Ames, Iowa: A Kaggle Competition

Kathryn Bryant and M. Aaron Owen share their Kaggle experiences.  First, Kathryn, et al: The lifecycle of our project was a typical one. We started with data cleaning and basic exploratory data analysis, then proceeded to feature engineering, individual model training, and ensembling/stacking. Of course, the process in practice was not quite so linear and […]

Read More

Data Wrangling At Scale

John Mount has a short article showing off the cdata package: Suppose we needed to un-pivot this data into a row oriented representation. Often big data transform steps can achieve a much higher degree of parallelization with “tall data”. With the cdata package this transform is easy and performant, as we show below. Read the whole thing.

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031