Jovan Popvic rides to the rescue with JSON:
The array cells are pivoted and returned as simple scalar columns. Now you can simply use WHERE or GROUP BY clauses to filter or summarize information by array element values. Another very useful piece of information might be the index of every element (generated as pos column).
Spark enables you to use the posexplode() function on every array cell. The posexplode() function will transform a single array element into a set of rows where each row represents one value in the array and the index of that array element. As a result, one row with the array containing three elements will be transformed into three rows containing scalar cells. This flattened/normalized representation is much easier for the analysis.
Once the array is flattened and normalized, you can easily analyze the data and find how much people knowing SQL or Java.
Read on to see how you can implement the equivalent of POSEXPLODE()
using OPENJSON()
in the Azure Synapse Analytics serverless SQL pool.