CSV Import Speeds With H2O

Kevin Feasel

2017-06-26

R

WenSui Liu benchmarks three CSV loading methods in R:

The importFile() function in H2O is extremely efficient due to the parallel reading. The benchmark comparison below shows that it is comparable to the read.df() in SparkR and significantly faster than the generic read.csv().

I’d wonder if there are cases where this would vary significantly; regardless, for reading a large data file, parallel processing does tend to be faster.

Related Posts

Reasons For Using Docker With R

Jeroen Ooms gives us a few reasons why we might want to containerize our R-based products: The flagship of the OpenCPU system is the OpenCPU server: a mature and powerful Linux stack for embedding R in systems and applications. Because OpenCPU is completely open source we can build and ship on DockerHub. A ready-to-go linux server […]

Read More

Linear Discriminant Analysis

Jake Hoare explains Linear Discriminant Analysis: Linear Discriminant Analysis takes a data set of cases (also known as observations) as input. For each case, you need to have a categorical variable to define the class and several predictor variables (which are numeric). We often visualize this input data as a matrix, such as shown below, with each case being a row and each variable a column. In this […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930