Press "Enter" to skip to content

Month: July 2022

Using JSON_PATH_EXISTS() in SQL Server

Hasan Savran shows how the JSON_PATH_EXISTS() function works in SQL Server:

Schemas can easily change if you save your data in JSON format. It is very easy to add or remove properties from JSON documents. When the data model changes quickly, you might need to worry about if the property you are looking for exists in the documents. If the path you are looking for does not exist in some documents, you need to handle the exception in some way. JSON_PATH_EXISTS comes to your help in situations like that. It tests whether a specified path exists in the input JSON.

Read on for the syntax and examples of use.

Comments closed

Comparing Column-Level Encryption to Always Encrypted

Tom Collins compares and contrasts:

A common question from developers & data owners  is what benefits does Always Encrypted offer over column level encryption  aka cell level encryption?    First thing to understand is what are the basic differences between the two methods – Column-level encryption vs Always encrypted

For as much as I appreciate Always Encrypted, it seems I use column-level encryption about an order of magnitude more often.

Comments closed

Creating Goal Post Tables

Aaron Bertrand solves a problem of unchecked growth:

Many of us deal with logging tables that grow unchecked for years, while reporting queries against them are expected to continue running quickly regardless of the size of the table. A common issue when querying by a date range is that the clustered index is on something else (say, an IDENTITY column). This will often result in a full clustered index scan, since SQL Server doesn’t have an efficient way to find the first or last row within the specified range. This means the same query will get slower and slower as the table grows.

I like this solution but only in cases where you expect no after-the-fact updates to dates, such as late-arriving date information or “fixing” the date later. With Aaron’s log example, where we expect log entries to be immutable, this can work really well in a “pseudo-materialized view” sort of way.

Comments closed

Expanding Column Width in Powershell Results

Kenneth Fisher supersizes the screen:

Notice the ellipsis (the three dots). That’s showing us that the name was too long and ended up being truncated. Given that I’ve been doing this for a little while now I’m almost completely certain that if I send this as it is the users are going to want to know full names. And with my luck I’ll end up having to give them each truncated string individually. On the theory that if I have time to do it twice I probably have time to do it right the first time, let’s figure out how to expand the columns. Fortunately, as with most things Powershell, there’s a cmdlet for that.

Read on to see what the process looks like.

Comments closed

Buffer Pool Parallel Scans in SQL Server 2022

David Pless talks about an internal optimization in SQL Server 2022:

Operations such as database startup/shutdown, creating a new database, file drop operations, backup/restore operations, Always On failover events, DBCC CHECKDB and DBCC Check Table, log restore operations, and other internal operations (e.g., checkpoint) will all benefit from Buffer Pool Parallel Scan.

In SQL Server 2019 and previous releases, operations that require scanning the buffer pool can be slow, especially on large memory machines such as the M-series Azure SQL virtual machine and large on-premises SQL Server environments. Even log restore operations and availability group failover operations can be impacted. Currently, there’s no way to eliminate this issue prior to SQL Server 2022, and dropping buffers using DBCC DROPCLEANBUFFERS would likely result in some degree of performance degradation as any subsequent query executions will have to reread the data from the database files increasing I/O.

Read on to understand why these operations can be slow on high-memory boxes and how much of a benefit you might get on certain administrative activities.

Comments closed

Visualizing Kafka Stream Lineage

David Araujo and Julia Peng show off stream lineage in Confluent Cloud:

Stream Lineage is a tool Confluent built to address the lack of data visibility in Kafka and event-driven architectures. Confluent’s Stream Lineage provides an interactive map of all your data flows that enable users to:

1. Understand what data flows are running both now or at any point in the past

2. Trace where each data flow originated from

3. Track how data is transformed along its journey

4. Observe where each data flow ends up

Read on to see how it works.

Comments closed

The Power of Hash Match Joins

Jared Poche looks at hash matching:

When I began working at Microsoft, I was very much a novice at performance troubleshooting. There was a lot to learn, and hash match joins were pointed out to me multiple times as the potential cause for a given issue. So, for a while I had it in my head, “hash match == bad”. But this really isn’t the case.

Hash matches aren’t inefficient; they are the best way to join large result sets together. The caveat is that you have a large result set, and that itself may not be optimal. Should it be returning this many rows? Have you included all the filters you can? Are you returning columns you don’t need?

I might throw in one caveat about hash match joins and being the best performers for two really large datasets joining together: merge join can be more efficient so long as both sets are guaranteed to be ordered in the same way without an explicit sort operator. That last clause is usually the kicker.

Comments closed

Ordered Columnstore Indexes in SQL Server 2022

Brent Ozar appreciates order:

So essentially, every column has a whole bunch of indexes on it.

But there’s no order whatsoever as to which rows end up in which index.

This isn’t a problem for relatively small tables, but as you get to billion-row data warehouse fact tables where columnstore should really shine, performance gradually degrades. In data warehouses, fact tables often have a commonly filtered column, like SaleDate. However, until SQL Server 2022, even if you wanted a small SaleDate range, your query would likely check hundreds or thousands of row groups, each of which had a huge range of data.

But do read the whole thing, as it seems it’s not working correctly in CTP 2.0 of SQL Server 2022. It is quite useful in Azure Synapse Analytics dedicated SQL pools, at least—that I can confirm.

Comments closed