Flattening JSON Data With Databricks

Ivan Vazharov gives us a Databricks notebook to parse and flatten JSON using PySpark:

With Databricks you get:

  • An easy way to infer the JSON schema and avoid creating it manually
  • Subtle changes in the JSON schema won’t break things
  • The ability to explode nested lists into rows in a very easy way (see the Notebook below)
  • Speed!

Following is an example Databricks Notebook (Python) demonstrating the above claims. The JSON sample consists of an imaginary JSON result set, which contains a list of car models within a list of car vendors within a list of people. We want to flatten this result into a dataframe.

Click through for the notebook.

Related Posts

What’s New In Ambari 2.7

Paul Codding and Kat Petre share some of the new features in Ambari 2.7: With this release, we wanted to make Ambari more enjoyable to use every day, simplify finding and using our API, and unblock teams managing very large clusters.  Here is a preview of a few features we’re excited to share with you:Revamped […]

Read More

Tips On Running SQL Server In RDS

Matthew McGiffen shares some tips on running SQL Server in Amazon RDS: Or you can go with Amazon RDS (Relational Database Service).  This is more of a managed service where Amazon looks after some aspects of your database server for you. In return you give up some of the control you would have with your […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930