Starting with SQL Server 2016, Microsoft provided a STRING_SPLIT function. It is a table-valued function that splits a string into rows of substrings, based on a specified separator character. It’s been a welcome addition that we waited a long time for. It has one shortcoming, though: the order of the output rows is not guaranteed to match the order of the substrings in the input string.
Microsoft also provided support for parsing JSON data starting with SQL Server 2016. I discovered the OPENJSON function can be used to split strings, and it can also return the ordinal position of each substring from the original input string.
There are some limitations which you’d expect, namely around requirements for valid JSON.
Microsoft added support for JSON data beginning with SQL Server 2016. JSON is an open-standard file format consisting of attribute–value pairs and array data types. It is commonly used to transmit data objects for asynchronous browser–server communication. But it is also used for storing unstructured data in files or NoSQL databases such as Microsoft Azure Cosmos DB. For most of us, SQL Server’s support for JSON probably means two things: we can convert relational data to JSON and vice versa. In this post, I’ll focus on converting JSON to relational data and share what I’ve learned from a recent experience.
I’ve been pleasantly surprised with the way JSON support works in SQL Server. It’s supported every complicated scenario I’ve had to deal with so far, including nesting, deciding with or without arrays for the outer element, quotes or no quotes around numbers, etc.
For the SQL Server developer, matrices are probably most valuable for solving more complex string-searching problems, using Dynamic Programming. Once you get into the mindset of this sort of technique, a number of seemingly-intractable problems become easier. Here are fifty common data structure problems that can be solved using Dynamic programming. Until SQL Server 2017, these were hard to do in SQL because of the lack of support for this style of programming. Memoization, one of the principles behind the technique is easy to do in SQL but it is very tricky to convert existing procedural algorithms to use table variables. It is usually easier and quicker to use strings as pseudo-variables as I did with Edit Distance and the Levenshtein algorithm, the longest common subsequence, and the Longest Common Substring. The problem with doing this is that the code to fetch the array values can be very difficult to decypher or debug. JSON can do it very easily with path array references.
The results aren’t fantastic but the code is easier at least.
Computed column indexes make querying JSON data fast and efficient, especially when the schema of the JSON data is the same throughout a table.
It’s also possible to break out a well-known complex JSON structure into multiple SQL Server tables.
However, what happens if you have different JSON structures being stored in each row of your database and you want to write efficient search queries against all of the rows of your complex JSON strings?
Bert’s solution is an example of a phenomenon I’ve noticed in relational databases: sometimes, the best solution is not the most straightforward. The most straightforward solution is to take the JSON as-is, but that hits a wall as Bert shows. Reshaping the data leads to much better performance…as long as you’re able to afford the time needed to reshape and don’t have JSON changing that frequently.
Sometime you would need to know how many Managed Instance you have created in Azure cloud. Although you can find all information about the Azure SQL Managed Instances in Azure portal or API (ARM, PowerShell, Azure CLI), sometime it is hard to list all instances and search them using some criteria. In this post you will see how easily you can load list of your Managed Instances and build inventory of your resources.
Imagine that you have a large number of Managed Instances and you need to know how many instances you have, in what regions, subnets, and virtual networks they are placed, how much compute and storage is allocated to each of them, etc. Analyzing inventory of Managed Instances might be hard if you just use PowerShell.
Click through for the solution.
For most of real-world applications, the JSON AUTO will not give you the control you could need over your file format, for having more control over it, you must use the JSON PATH option, along with the ROOT option as follows:
SELECT TOP 10 id, dataVarchar, dataNumeric, dataInt, dataDate FROM [dbo].[MyTestTable] FOR JSON PATH, ROOT('TestTable')
Eduardo has several examples along these lines.
In a previous post, I wrote about how to determine if a set of incoming values for a row are different to all the existing values in the row, using T-SQL in SQL Server.
If you are using SQL Server 2016 or later, I suspect this is a really good option.
Click through to see an example using the WideWorldImporters database.
With Databricks you get:
- An easy way to infer the JSON schema and avoid creating it manually
- Subtle changes in the JSON schema won’t break things
- The ability to explode nested lists into rows in a very easy way (see the Notebook below)
Following is an example Databricks Notebook (Python) demonstrating the above claims. The JSON sample consists of an imaginary JSON result set, which contains a list of car models within a list of car vendors within a list of people. We want to flatten this result into a dataframe.
Click through for the notebook.
What happened? The T-SQL produces the correct results in SQL Server Management Studio (SSMS). However, in SSIS, the same T-SQL statement in an OLE DB Source in a Data Flow Task produces two rows of data which adds a line feed into the flat file and renders the JSON unusable.
The problem is visible even before sending output to the flat file.
Click the link to see how Stacia solves this problem.
And then I had to write about it in my book Introducing Microsoft SQL Server 2016 (which is free to download) when JSON support was added to SQL Server 2016. But I still didn’t have clients using JSON. It was interesting to me that I could use SQL Server to work with JSON data, but it was still theoretical to me rather than practical.
Therefore, I never thought much about how I would handle it in SQL Server Integration Services (SSIS). I just didn’t have a reason.
Until now. This seems to be the year that I am bumping into JSON left and right. It’s everywhere!
Read on for those methods as well as Stacia’s recommendation.