Using Python 3.4 With EMR And Spark

Bruno Faria shows how to use Python 3.4 with Spark on Amazon’s ElasticMapReduce:

An EMR 4.6 cluster running Spark 1.6.1 will still use Python 2.7 as the default interpreter. If you want to change this, you will need to set the environment variable: PYSPARK_PYTHON=python34. You can do this when you launch a cluster by using the configurations API and supplying the configuration shown in the snippet below:

I’m more of a SQL and Scala guy, but if you like Python and are on the Python 3 side of the divide, here’s a solution for you.

Related Posts

Databricks Runtime 5.5

Bilal Aslam and Yifan Cao announce Databricks Runtime 5.5: Secrets API in R notebooksThe Databricks Secrets API [Azure|AWS] lets you inject secrets into notebooks without hardcoding them. As of Databricks Runtime 5.5, this API is available in R notebooks in addition to existing support for Python and Scala notebooks. You can use the dbutils.secrets.get function to obtain […]

Read More

Hooking SQL Server to Kafka

Niels Berglund has an interesting scenario for us: We see how the procedure in Code Snippet 2 takes relevant gameplay details and inserts them into the dbo.tb_GamePlay table. In our scenario, we want to stream the individual gameplay events, but we cannot alter the services which generate the gameplay. We instead decide to generate the event from the database […]

Read More

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031