Data Wrangling At Scale

Kevin Feasel

2017-11-21

R, Spark

John Mount has a short article showing off the cdata package:

Suppose we needed to un-pivot this data into a row oriented representation. Often big data transform steps can achieve a much higher degree of parallelization with “tall data”. With the cdata package this transform is easy and performant, as we show below.

Read the whole thing.

Related Posts

R In Linux For Windows

David Smith shows how to install and use R in the Windows Subsystem for Linux: R has been available for Windows since the very beginning, but if you have a Windows machine and want to use R within a Linux ecosystem, that’s easy to do with the new Fall Creator’s Update (version 1709). If you […]

Read More

A Hack For Dynamic ML Services Result Sets

Dave Mason has put together a solution to his dynamic data frame naming problem: We can take those names and R types, string them together, and “convert” them to SQL data types. (Mapping data types from one language to another is waaaay outside the scope of this post. Lines 11-13 are quick and dirty, just […]

Read More

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930