John Mount hits on an issue when using dplyr backed by a database in R:
Notice the above gives an incorrect result: all of the
x_i
columns are identical, and all of they_i
columns are identical. I am not saying the above code is in any way desirable (though something like it does arise naturally in certain test designs). If this is truly “incorrectdplyr
code” we should have seen an error or exception. Unless you can be certain you have no code like that in a database backeddplyr
project: you can not be certain you have not run into the problem producing silent data and result corruption.The issue is:
dplyr
on databases does not seem to have strong enough order of assignment statement execution guarantees. The running counter “delta
” is taking only one value for the entire lifetime of thedplyr::mutate()
statement (which is clearly not what the user would want).
Read on for a couple of suggested solutions.