I’m an R programmer. To me, R has been great for data exploration, transformation, statistical modeling, and visualizations. However, there is a huge community of Data Scientists and Analysts who turn to Python for these tasks. Moreover, both R and Python experts exist in most analytics organizations, and it is important for both languages to coexist.
Many times, this means that R coders will develop a workflow in R but then must redesign and recode it in Python for their production systems. If the coder is lucky, this is easy, and the R model can be exported as a serialized object and read into Python. There are packages that do this, such as pmml. Unfortunately, many times, this is more challenging because the production system might demand that the entire end to end workflow is built exclusively in Python. That’s sometimes tough because there are aspects of statistical model building in R which are more intuitive than Python.
Python has many strengths, such as its robust data structures such as Dictionaries, compatibility with Deep Learning and Spark, and its ability to be a multipurpose language. However, many scenarios in enterprise analytics require people to go back to basic statistics and Machine Learning, which the classic Data Science packages in Python are not as intuitive as R for. The key difference is that many statistical methods are built into R natively. As a result, there is a gap for when R users must build workflows in Python. To try to bridge this gap, this post will discuss a relatively new package developed by Microsoft, revoscalepy.
Having worked with both, my loyalties tend to lie with R for a couple of reasons. But this might help some people bridge the gap.