John Mount helps us understand writing R code like a native:
This sort of difference, scalar oriented
C++
being so much faster than scalar orientedR
, is often distorted into “R
is slow.”
This is just not the case. If we adapt the algorithm to be vectorized we get anR
algorithm with performance comparable to theC++
implementation!
Not all algorithms can be vectorized, but this one can, and in an incredibly simple way. The original algorithm itself (xlin_fits_R()
) is a bit complicated, but the vectorized version (xlin_fits_V()
) is literally derived from the earlier one by crossing out the indices. That is: in this case we can move from working over very many scalars (slow inR
) to working over a small number of vectors (fast inR
).
This is akin to writing set-based SQL instead of cursor-based SQL: you’re thinking in terms which make it easier for the interpreter (or optimizer, in the case of a database engine) to operate quickly over your inputs. It’s also one of a few reasons why I think learning R makes a lot of sense when you have a SQL background.