Multi-Structured Data In U-SQL

Melissa Coates shows us how to use U-SQL to normalize JSON files in which different rows may have differing structures:

Handling the varying formats in U-SQL involves a few steps if it’s the first time you’ve done this:

  1. Upload custom JSON assemblies  [one time setup]

  2. Create a database   [one time setup]

  3. Register custom JSON assemblies   [one time setup]

  4. Upload JSON file to Azure Data Lake Store [manual step as an example–usually automated]

  5. Run U-SQL script to “standardize” the JSON file(s) into a consistent CSV column/row format

Melissa then shows us how to do this step-by-step.

Convert SSAS Tabular Processing Scripts Into Tables

Chris Koester shows how to take an Analysis Services Tabular processing script in TMSL format and turn it into a table using OPENJSON:

The previous post looked at how to process SSAS Tabular models with TMSL. Since SQL Server adds new JSON capabilities in 2016, let’s look at how to convert TMSL JSON to a Table with OPENJSON. OPENJSON is a new function in SQL Server 2016 that, per Microsoft:

OPENJSON is a table-valued function that parses JSON text and returns objects and properties from the JSON input as rows and columns.

In short, OPENJSON converts JSON text to a table. Since TMSL is JSON, this function can convert a SSAS Tabular processing script into a table. This could be useful if you wanted to document a SSAS processing schedule.

That’s an interesting use of OPENJSON.

JSON In Powershell

Adam Bertram shows an easy way of dealing with JSON data inside Powershell:

As the world continually becomes “eaten by software,” more and more services are being replaced by software. IT pros have most likely seen this in the form of software-defined everything. One of the premier components of this focus on software and with the continuing adoption of DevOps is application programming interfaces (APIs). All of these services needs to talk together and must provide a way for programs and users to interact with them. This is where APIs come in handy. But, what does this have to do with PowerShell and JSON, you ask?

APIs, more specifically REST APIs, return data when queried. This data is typically in the JSON format. JSON is a way of structuring data that makes it easy for software to consume. When working with PowerShell, Microsoft has provided some helpful tools to work with JSON called the ConvertTo-Json and ConvertFrom-Json commands. These commands allow you to quickly work with REST APIs or any other service that returns or accepts JSON as an input.

Read on for more details on how to use these commands.

JSON Data Sources In SSIS

Chris Koester shows how to read JSON data sources in SQL Server Integration Services:

Once the Script Component has been defined as a source, the output columns can be defined. For this post, the same USGS Earthquake data that was used in the “Download JSON data with PowerShell” post will serve as an example. Be careful to choose the correct data types here. This can be tedious because you have to choose the correct data types in the C# code as well, and ensure that they correspond with the SSIS types. It’s helpful to bookmark a SSIS data type translation table for reference.

It does involve creating a script component, but aside from the tedium that Chris mentions, it’s not too bad.

Using WITH With OPENJSON

Jovan Popovic points out the performance difference in using the WITH clause in an OPENJSON query:

 

Here are results of the queries:

SQL Server Execution Times: CPU time = 656 ms, elapsed time = 651 ms.
SQL Server Execution Times: CPU time = 204 ms, elapsed time = 197 ms.

As you can see, WITH clause specify that OPENJSON should immediately return properties from the JSON array without second parsing. Performance of the queries might be increased 3 times if you avoid double parsing.

That’s a pretty big difference when you specify the relevant data model elements.

Improving Performance For JSON In SQL Server

Bert Wagner shows how to use indexed, non-persisted columns to pre-parse JSON data in SQL Server:

This is basically a cheat code for indexing computed columns.

SQL will only compute the “Make” value on a row’s insert or update into the table (or during the initial index creation) — all future retrievals of our computed column will come from the pre-computed index page.

This is how SQL is able to parse indexed JSON properties so fast; instead of needing to do a table scan and parsing the JSON data for each row of our table, SQL Server can go look up the pre-parsed values in the index and return the correct data incredibly fast.

Personally, I think this makes JSON that much easier (and practical) to use in SQL Server 2016. Even though we are storing large JSON strings in our database, we can still index individual properties and return results incredibly fast.

It’s great that the database engine is smart enough to do this, but I’m not really a big fan of storing data in JSON and parsing it within SQL Server, as that violates first normal form.  If you know you’re going to use Make as an attribute and query it in SQL, make it a real attribute instead of holding multiple values in a single attribute.

JSON Dates In SQL Server

Bert Wagner explains how to handle JSON datetime strings in SQL Server:

In SQL Server, datetime2’s format is defined as follows:

YYYY-MM-DD hh:mm:ss[.fractional seconds]

JSON date time strings are defined like:

YYYY-MM-DDTHH:mm:ss.sssZ

Honestly, they look pretty similar. However, there are few key differences:

  • JSON separates the date and time portion of the string with the letter T

  • The Z is optional and indicates that the datetime is in UTC (if the Z is left off, JavaScript defaults to UTC). You can also specify a different timezone by replacing the Z with a + or  along with HH:mm (ie. -05:00 for Eastern Standard Time)

  • The precision of SQL’s datetime2 goes out to 7 decimal places, in JSON and JavaScript it only goes out to 3 places, so truncation may occur.

Read on for a few scripts handling datetime conversions between these types.

When To Store JSON In SQL Server

Kevin Feasel

2017-03-15

JSON

Bert Wagner thinks about which conditions should hold for it to make sense to store JSON in SQL Server:

Every once in a while I hear of some technologist say that relational databases are dead; instead, a non-table based NoSQL storage format is the way of the future. SQL Server 2016 introduced JSON functionality, making it possible for some “non-SQL” data storage to make its way into the traditionally tabled-based SQL Server.

Does this mean all data in SQL Server going forward should be stored in long JSON strings? No, that would be a terrible idea. There are instances when storing JSON in SQL Server is a good choice though. In this post I want to create recommendations for when data should be stored as JSON and when it shouldn’t.

Protip:  anyone who says relational databases are dead is already working with one strike against.  Bert has great use cases for JSON as well as a good understanding that there are plenty of anti-use cases, making his post well worth reading.

Parsing Gigantic JSON Text

Kevin Feasel

2017-02-20

JSON

Jovan Popvic has created a 4.35 GB JSON array:

SQL Server 2016 and Azure SQL Database enable you to parse JSON text and transform it into tabular format. In this post, you might see that JSON functions can handle very large JSON text – up to 4GB.

First, I would need very large JSON document. I’m using TPCH database so I will export the content of lineitem table in a file. JSON can be exported using the bcp.exe program:

My first draft read “Jovan Popovic has created a monster.”  I might go back to that one.  On the plus side, the operation took a lot less time than I had expected, though I’d have to imagine that his SQL Express instance had some decent specs.

High-Compression JSON With vNext

Jovan Popovic gives an example of 25x compression of JSON data using a clustered columnstore index in vNext:

CLUSTERED COLUMNSTORE INDEXES provide extreme data compression in SQL Server and Azure SQL Database. With NVARCHAR(MAX) support in CCI indexes you can use them on your JSON data stored is database and get high 25x compression. Therefore, CCI is a perfect solution if you need to store a large volume of JSON data in your SQL Database.

ContosoDW database is publicly available for download, so you can use this database and the script below to re-create this table and try this in your environment.

I’m curious whether this will also apply to non-JSON data.

Categories

September 2017
MTWTFSS
« Aug  
 123
45678910
11121314151617
18192021222324
252627282930