Press "Enter" to skip to content

Data Reading and Writing with arrow

Colin Gillespie performs two of the three R’s:

Apache Arrow is a cross-language development platform for in-memory data. As it’s in-memory (as opposed to data stored on disk), it provides additional speed boosts. It’s designed for efficient analytic operations, and uses a standardised language-independent columnar memory format for flat and hierarchical data. The {arrow} R package provides an interface to the ‘Arrow C++’ library – an efficient package for analytic operations on modern hardware.

There are many great tutorials on using {arrow} (see the links at the bottom of the post for example). The purpose of this blog post isn’t to simply reproduce a few examples, but to understand some of what’s happening behind the scenes. In this particular post, we’re interested in understanding the reading/writing aspects of {arrow}.

Read on to see it in action in R.