Warning When Using dplyr Mutate

John Mount has a warning if you are using dplyr’s mutate function and connecting to Spark or a database:

If you are using the R dplyr package with a database or with Apache Spark: I respectfully advise you inspect your code to ensure you are not using any values created inside a dplyr::mutate() statement inside the same dplyr::mutate() statement. This has been my coding advice for some time, and it is a simple and safe re-factoring to break up such statements into safer sequences (simply by introducing more dplyr::mutate()s).

I have since encountered a non-signaling (or silent) result corruption version of the issue. We are now advising code inspection as we now have confirmation that not seeing a thrown error is not a reliable indication of correct execution and correct results.

Thanks to John for reporting, and hopefully the dplyr team can fix it.

Related Posts

R In Linux For Windows

David Smith shows how to install and use R in the Windows Subsystem for Linux: R has been available for Windows since the very beginning, but if you have a Windows machine and want to use R within a Linux ecosystem, that’s easy to do with the new Fall Creator’s Update (version 1709). If you […]

Read More

A Hack For Dynamic ML Services Result Sets

Dave Mason has put together a solution to his dynamic data frame naming problem: We can take those names and R types, string them together, and “convert” them to SQL data types. (Mapping data types from one language to another is waaaay outside the scope of this post. Lines 11-13 are quick and dirty, just […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

December 2017
MTWTFSS
« Nov  
 123
45678910
11121314151617
18192021222324
25262728293031