Press "Enter" to skip to content

Category: JSON

Visualizing JSON Files in Fabric Notebooks

Sandeep Pawar wants readability:

JSON is ubiquitous, particularly when working with APIs and logs. Its unstructured nature makes it highly flexible for handling anything from a simple array to a complex nested structure. However, this can also make it challenging for data analysis. When parsing JSON, it’s crucial to understand its structure so you can flatten it and convert it into a tabular format for analysis. Once the structure is identified, you can use pandas or PySpark to explode or normalize it into the desired shape. In this article, I will explain the method I use. While this approach is applicable to any notebook, there is a specific trick to make it work in a Fabric notebook.

Read on for that trick.

Leave a Comment

Private Preview of Native JSON Type for Azure SQL DB

Umachandar Jayachandran (or, as we all know him, UC) makes an announcement:

We are excited to announce the private preview of native JSON type and JSON_OBJECTAGG & JSON_ARRAYAGG aggregates in Azure SQL Database. The JSON type will allow you to store JSON documents in a native binary format that is optimized for storage and query performance. The ANSI SQL compatible JSON aggregates – JSON_OBJECTAGG & JSON_ARRAYAG will allow you to aggregate relational data and transform the data into JSON documents in a query.

I do have to admit that the native JSON type was a bit curious, given that they had assiduously rejected the notion of introducing a native JSON type for years, yet here we are. But if there are significant enough performance gains—and there can be by moving from text to binary JSON—it can be worth it. The XML type also allowed you to create indexes, which is probably easier to do with a native type.

Comments closed

Converting JSON to a Relational Schema with KQL

Devang Shah does some flattening and moving:

In the world of IoT devices, industrial historians, infrastructure and application logs, and metrics, machine-generated or software-generated telemetry, there are often scenarios where the upstream data producer produces data in non-standard schemas, formats, and structures that often make it difficult to analyze the data contained in these at scale. Azure Data Explorer provides some useful features to run meaningful, fast, and interactive analytics on such heterogenous data structures and formats. 

In this blog, we’re taking an example of a complex JSON file as shown in the screenshot below. You can access the JSON file from this GitHub page to try the steps below.

Click through for the example, which is definitely non-trivial.

Comments closed

Invoke External REST Endpoints from Azure SQL DB

Rob Farley is impressed:

This internal procedure is new in Azure SQL DB in 2022. I think it presents a significant change to the way we do things in the world of SQL, and makes some other tools a whole lot more useful as well.

sp_invoke_external_rest_endpoint lets me send data to a REST API from within a stored procedure. Invoking an HTTP REST endpoint – as simple as that. And while I know you’re probably thinking, “But I can send data to a REST API from anywhere – why do I need to do it from within a stored procedure?”, I want to describe a few scenarios to you.

I like having the functionality, though would want to control how frequently my teams would use it. The reason is that this potentially makes your database the a domain boundary (when thinking in domain-driven design concepts).

Comments closed

OPENJSON Performance and Schemas

Dave Mason has a new blog theme and a post on OPENJSON performance:

Support for JSON data has been around in SQL Server for a while now, starting with SQL 2016. The OPENJSON rowset function is the built-in function that allows you to natively convert JSON text into a set of rows and columns. There are two options for using OPENJSON: with the default schema or with an explicit schema. There are performance implications for each, which I’ll review with some examples.

Dave has some nice tips for people working with JSON data in SQL Server.

Comments closed

Using JSON_PATH_EXISTS() in SQL Server

Hasan Savran shows how the JSON_PATH_EXISTS() function works in SQL Server:

Schemas can easily change if you save your data in JSON format. It is very easy to add or remove properties from JSON documents. When the data model changes quickly, you might need to worry about if the property you are looking for exists in the documents. If the path you are looking for does not exist in some documents, you need to handle the exception in some way. JSON_PATH_EXISTS comes to your help in situations like that. It tests whether a specified path exists in the input JSON.

Read on for the syntax and examples of use.

Comments closed

Constructing JSON Objects in SQL Server

Hasan Savran checks out a couple of functions new to SQL Server 2022:

JSON Functions are introduced to SQL Server in version 2016. Saving JSON documents and retrieving documents using JSON Functions brings many possibilities to SQL Server. It is great to see that Microsoft continues to add different functions to the original JSON functions set.

    Today, I will explain two new JSON functions which are available in SQL Server 2022 and Azure SQL Database. 

Read on to learn more about these functions.

Comments closed

JSON Enhancements in Azure SQL DB and SQL Server 2022

Umchandar Jayachandran has an announcement:

Today, we are announcing the public preview of JSON enhancements in Azure SQL Database and SQL Server 2022 CTP 2.0. This preview contains an enhancement to ISJSON function and three new JSON functions – JSON_PATH_EXISTS, JSON_OBJECT and JSON_ARRAY. Currently, the ISJSON function allows you to test if a string value contains a valid JSON object or array. The new optional json_type_constraint parameter in ISJSON function can now be used to test conformance of JSON documents to the IETF RFC 8259 specification. This capability allows you to test for strings that contain a JSON value, scalar, object, or array. This functionality is like the IS JSON predicate in the ANSI SQL standard. The new JSON_PATH_EXISTS function allows you to test for the existence of a specific SQL/JSON path expression in a JSON document. This functionality is like the JSON_EXISTS predicate in the ANSI SQL standard. The new ANSI SQL compatible JSON value constructors – JSON_OBJECT and JSON_ARRAY functions allow you to construct JSON object or array from SQL data.

Even if you don’t store data in JSON format, there are good reasons why you might need to accept data in JSON format (or emit data in JSON format), especially when working with languages like R and Python.

Comments closed

Building posexplode() in the Serverless SQL Pool

Jovan Popvic rides to the rescue with JSON:

The array cells are pivoted and returned as simple scalar columns. Now you can simply use WHERE or GROUP BY clauses to filter or summarize information by array element values. Another very useful piece of information might be the index of every element (generated as pos column).

Spark enables you to use the posexplode() function on every array cell. The posexplode() function will transform a single array element into a set of rows where each row represents one value in the array and the index of that array element. As a result, one row with the array containing three elements will be transformed into three rows containing scalar cells. This flattened/normalized representation is much easier for the analysis.

Once the array is flattened and normalized, you can easily analyze the data and find how much people knowing SQL or Java.

Read on to see how you can implement the equivalent of POSEXPLODE() using OPENJSON() in the Azure Synapse Analytics serverless SQL pool.

Comments closed

Ordered String Splitting with OPENJSON

Aaron Bertrand splits and cares about sort order:

Last year, I wrote about replacing all your CLR or custom string splitting functions with native calls to STRING_SPLIT. As I work on a project migrating several Microsoft SQL Server instances to Linux, I am encountering one of the roadblocks I mentioned last time: the need to provide an element in the output to indicate the order of the elements in the input string. This means STRING_SPLIT in its current form is out, because 1) it offers no such column; and, 2) the results are not guaranteed to be returned in any specific order. Are there other ways to achieve this functionality at scale and without CLR?

As Koen mentions in the comments, you can now get STRING_SPLIT with a sort parameter, but Aaron’s response is also valid: not everybody will have access to that today, so it still makes sense to understand the options.

Comments closed