Press "Enter" to skip to content

Category: Syntax

Storing and Parsing JSON in SQL Server

Ed Pollack talks JSON:

Like XML, JSON is an open standard storage format for data, metadata, parameters, or other unstructured or semi-structured data. Because of its heavy usage in applications today, it inevitably will make its way into databases where it will need to be stored, compressed, modified, searched, and retrieved.

Even though a relational database is not the ideal place to store and manage less structured data, application requirements can oftentimes override an “optimal” database design. There is a convenience in having JSON data close to related relational data and architecting its storage effectively from the start can save significant time and resources in the future.

Read on for plenty of examples and tips. Ideologically, I have no problem parsing JSON to load data into SQL Server. I have no real problem storing data in JSON if the calling application takes that JSON as-is and does not expect the database to modify or shred that JSON. I have no problem taking relational data and creating JSON structures to send out to calling applications. My problem comes when you store the data as JSON but then expect the database to manage data. Treat the JSON blob as atomic and we’re fine; otherwise, I want to make that data relational, as befits a relational database.

2 Comments

Data Integrity and the LIKE Operator

Joe Celko provides some context:

The LIKE operator has been in SQL since the very beginning. It’s a very weak form of pattern matching, that has only two simple wildcards. The wildcards are ‘_‘ (underscore) and ‘%‘ (percent). They respectively stand for any single character and for zero or more characters in a substring. The syntax is pretty straightforward:

Read on for examples of LIKE in action, including in check constraints.

Comments closed

Deconstructing Bitwise Days of the Week

Aaron Bertrand says, four days of the week, she thinks I’m the enemy:

I recently had to help support synchronization and distribution of workloads between multiple servers. Some of this work involves Task Scheduler and, be honest, who knew Task Scheduler was this complicated? On different servers, we wanted jobs to trigger on different days of the week. The way that Task Scheduler handles this programmatically is through a numeric property called WeeklyTrigger.DaysOfWeek, where the following coefficients are packed together into a single value:

Read on for the kind of pain that you can find within Task Scheduler. Any time I see bitwise storage like that in SQL Server, it hurts my head and not in a fun way. Aaron does help bring some sanity to the process, at least.

Comments closed

The World is CASE Expressions

Chad Baldwin re-enacts the astronaut meme:

Well…not really, but a handful of functions in T-SQL are simply just syntactic sugar for plain ol’ CASE statements and I thought it would be fun to talk about them for a bit because I remember being completely surprised when I learned this. I’ve also run into a couple weird scenarios directly because of this.

For those who don’t know what the term “syntactic sugar” means…It’s just a nerdy way to say that the language feature you’re using is simply a shortcut for another typically longer and more complicated way of writing that same code and it’s not unique to SQL.

Here’s where I push up my no-longer-existent glasses and say, “Well, actually, it’s a CASE expression rather than a statement because it always returns a value–or a dreaded NULL non-value–of some explicit data type.” But Chad is absolutely right about several T-SQL functions and operators being adaptations of the CASE expression syntax under the covers and I’m just being annoyingly pedantic for the fun of it.

1 Comment

Calling a REST Endpoint from Azure SQL Database

Meagan Longoria makes a call:

External REST endpoint invocation in Azure SQL DB went GA in August 2023. Whereas before, we might have needed an intermediate technology to make a REST call to an Azure service, we can now use an Azure SQL Database to call a REST endpoint directly.

One use case for this would be to retrieve a file from blob storage. I explain how to set this up below.

Read on to learn more about the process, as well as a few ideas on when you might use it.

Comments closed

Row Pattern Recognition in PostgreSQL

Tatsuo Ishii makes me jealous:

Row pattern recognition (RPR) is a feature defined in the SQL standard. It allows to search for a sequence of rows by pattern.  Since I am working on this feature for PostgreSQL, I would like to give a brief introduction to RPR.

This is something I’ve wanted to see in SQL Server for the past few years. It’s not something people might use every day, but it solves a particular class of problem extremely well.

Comments closed

Execution Plans for Keyset Cursors

Hugo Kornelis talks about a cursor I’d never heard of before:

Welcome to plansplaining, part 32, where we once more look at cursors. We already discussed the basics, and looked at static cursors and dynamic cursors. It is now time to cast our gaze upon the keyset cursor. The keyset cursor is sort of in between the static cursor (which presents a snapshot of the data is at was when the cursor was opened and disregards future changes) and the dynamic cursor (that always shows the current data). To be precise, a keyset cursor sees changes made to already existing rows, but does not see new rows that were added after the cursor was opened.

Read on to learn more about it.

Comments closed

Enumerations and Ordering in Postgres

Christoph Schiessl sorts things out:

Custom ENUM types in PostgreSQL are an excellent tool for enforcing certain database constraints, but you must be careful if you use SELECT queries and want to ORDER BY these columns. Recently, I had to fix a bug whose root cause was a misunderstanding of this behavior. It’s just a contrived example, but imagine a table of people with their marital status, which is implemented as a custom ENUM type.

Read on to learn more about the misunderstanding and some of the unexpected trickiness involved in getting a good query plan.

Comments closed

SELECT FOR UPDATE in Postgres

Semab Tariq notes some syntax to assist with performing updates in a MVCC world:

In critical environments like banking, healthcare, and online retail, ensuring safe data modifications is crucial to prevent data corruption and maintain system integrity. PostgreSQL offers a robust solution for this with its row-level locking mechanism, which ensures that the data being modified is protected from concurrent changes. One key feature of PostgreSQL is the SELECT FOR UPDATE clause, which locks the selected rows against concurrent updates. In this blog, we will explore how to implement the SELECT FOR UPDATE clause in PostgreSQL and discuss its real-world use cases.

Read on to learn more about how it works.

Comments closed

Approximate Percentiles in SQL Server 2022

Chad Callihan tries out a big improvement:

How do you go about finding the median percentile of a data set? What if you need the top x percentile? Both the APPROX_PERCENTILE_CONT and APPROX_PERCENTILE_DISC functions can be used to solve these questions.

Let’s look at how we can use each and what makes them unique.

The approximate percentiles are guaranteed to be accurate to within a certain percentage, something like 3-5%, if I remember correctly–it’s higher than HyperLogLog’s ~2.5% but not so large as to be of low value. If you’ve ever tried to calculate a median or other percentile like the 75th or 95th percentile, you might have used PERCENTILE_CONT() in the past. At least until you get a few million rows in the table, at which point you stopped using it. My joke is, once you reach a certain table size, PERCENTILE_CONT() becomes so slow that it’s faster to install and configure SQL Server ML Services, learn R or Python, and send in the data to calculate a percentile than to wait for PERCENTILE_CONT() to complete.

The APPROX_PERCENTILE_* series is way, way faster. On reasonable-sized test cases of a couple million rows or so, my recollection is two orders of magnitude better performance, so long as you can deal with being off by a few percentage points. One of the best scenarios for something like this is calculating 95th percentile response times. Does it really matter that the actual response time was 187.5ms and SQL Server said 192.6 or 181.4? Probably not—you get a good idea of the magnitude, and that’s the important part here.

Comments closed