CSV Import Speeds With H2O

Kevin Feasel



WenSui Liu benchmarks three CSV loading methods in R:

The importFile() function in H2O is extremely efficient due to the parallel reading. The benchmark comparison below shows that it is comparable to the read.df() in SparkR and significantly faster than the generic read.csv().

I’d wonder if there are cases where this would vary significantly; regardless, for reading a large data file, parallel processing does tend to be faster.

Related Posts

Polar Charts In Power BI With R

Leila Etaati shows how to build a polar chart in Power BI using an R component: I just add a layer to the above furmula “coord_polar()” this function also has been used for creating pie charts. it gets the “theta” variable, in below example I put theta=y axis, so we have below charts Normally I […]

Read More

Basics Of Survival Analysis

Subhasree Chatterjee explains the basics of survival analysis: Survival analysis is a set of methods to analyze the ‘time to occurrence’ of an event. The response is often referred to as a failure time, survival time, or event time. These methods are widely used in clinical experiments to analyze the ‘time to death’, but nowadays […]

Read More


June 2017
« May Jul »