Warning When Using dplyr Mutate

Kevin Feasel


R, Spark

John Mount has a warning if you are using dplyr’s mutate function and connecting to Spark or a database:

If you are using the R dplyr package with a database or with Apache Spark: I respectfully advise you inspect your code to ensure you are not using any values created inside a dplyr::mutate() statement inside the same dplyr::mutate() statement. This has been my coding advice for some time, and it is a simple and safe re-factoring to break up such statements into safer sequences (simply by introducing more dplyr::mutate()s).

I have since encountered a non-signaling (or silent) result corruption version of the issue. We are now advising code inspection as we now have confirmation that not seeing a thrown error is not a reliable indication of correct execution and correct results.

Thanks to John for reporting, and hopefully the dplyr team can fix it.

Related Posts

ggplot2 Geoms And Aesthetics

Tyler Rinker digs into ggplot2’s geoms and aesthetics: I thought it my be fun to use the geoms aesthetics to see if we could cluster aesthetically similar geoms closer together. The heatmap below uses cosine similarity and heirarchical clustering to reorder the matrix that will allow for like geoms to be found closer to one […]

Read More

Legible Function Chaining In R

John Mount shows a few techniques for legible function chaining with R: The dot intermediate convention is very succinct, and we can use it with base R transforms to get a correct (and performant) result. Like all conventions: it is just a matter of teaching, learning, and repetition to make this seem natural, familiar and legible. My […]

Read More


December 2017
« Nov Jan »