Using Python 3.4 With EMR And Spark

Bruno Faria shows how to use Python 3.4 with Spark on Amazon’s ElasticMapReduce:

An EMR 4.6 cluster running Spark 1.6.1 will still use Python 2.7 as the default interpreter. If you want to change this, you will need to set the environment variable: PYSPARK_PYTHON=python34. You can do this when you launch a cluster by using the configurations API and supplying the configuration shown in the snippet below:

I’m more of a SQL and Scala guy, but if you like Python and are on the Python 3 side of the divide, here’s a solution for you.

Related Posts

Overriding Spark Dependencies

Landon Robinson shows how to override a Spark dependency located on the classpath: This doesn’t draw the line exactly where the method changed from private to public, but generally speaking:– gson-2.2.4.jar: the method is private, and therefore too old for use here– gson-2.6.1: the method is public, and works fine.– Somewhere between the two, the […]

Read More

Kafka and MirrorMaker

Renu Tewari describes what MirrorMaker does for Kafka today and what is coming with version 2: Apache Kafka has become an essential component of enterprise data pipelines and is used for tracking clickstream event data, collecting logs, gathering metrics, and being the enterprise data bus in a microservices based architectures. Kafka is essentially a highly […]

Read More

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031