Steven Sanderson performs some tests:
We can save the generated matrix in different file formats using different functions in R. Here are the functions we will use for each file format:
- CSV: write.csv()
- RDS: saveRDS()
- FST: write_fst()
- Arrow: write_feather()
Steve then has a follow-up around compressed data:
In this post I create a square matrix and then convert it to a data.frame (2,000 rows by 2,000 columns) and then saved it as a gz compressed csv file. The benchmark compares different R packages and functions, including base
R
,data.table
,vroom
, andreadr
, and measures their relative speeds based on the time it takes to read in the.csv.gz
file.
There’s not a direct comparison between the two posts, as the second matrix is larger than the first, though even with that caveat in mind, this post lets you see how much extra processing occurs to gunzip the data before reading it.