Flattening JSON Data With Databricks

Ivan Vazharov gives us a Databricks notebook to parse and flatten JSON using PySpark:

With Databricks you get:

An easy way to infer the JSON schema and avoid creating it manually

Subtle changes in the JSON schema won’t break things

The ability to explode nested lists into rows in a very easy way (see the Notebook below)

Speed!

Following is an example Databricks Notebook (Python) demonstrating the above claims. The JSON sample consists of an imaginary JSON result set, which contains a list of car models within a list of car vendors within a list of people. We want to flatten this result into a dataframe.

Click through for the notebook.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30