Press "Enter" to skip to content

A Functional Approach To PySpark

Tristan Robinson shows us how we can implement a transform function which makes Python code look a little bit more functional:

After a small bit of research I discovered the concept of monkey patching (modifying a program to extend its local execution) the DataFrame object to include a transform function. This function is missing from PySpark but does exist as part of the Scala language already.

The following code can be used to achieve this, and can be stored in a generic wrapper functions notebook to separate it out from your main code. This can then be called to import the functions whenever you need them.

Things which make Python more of a functional language are fine by me. Even though I’d rather use Scala.