Press "Enter" to skip to content

Day: February 26, 2025

Vertical Partitioning Rarely Works

Brent Ozar lays out an argument:

You’re looking at a wide table with 100-200 columns.

Years ago, it started as a “normal” table with maybe 10-20, but over the years, people kept gradually adding one column after another. Now, this behemoth is causing you problems because:

  • The table’s size on disk is huge
  • Queries are slow, especially when they do table scans
  • People are still asking for more columns, and you feel guilty saying yes

You’ve started to think about vertical partitioning: splitting the table up into one table with the commonly used columns, and another table with the rarely used columns. You figure you’ll only join to the rarely-used table when you need data from it.

Read on to understand why this is rarely a good idea and what you can do instead.

I will say that I’ve had success with vertical partitioning in very specific circumstances:

  1. There are large columns, like blobs of JSON, binary data, or very large text strings.
  2. There exists a subset of columns the application (or caller) rarely needs.
  3. Those large columns are in the subset of columns the caller rarely needs, or can access via point lookup.

For a concrete example, my team at a prior company worked on a product that performed demand forecasting on approximately 10 million products across the customer base. For each product, we had the choice between using a common model (if the sales fit a common pattern) or generating a unique model. Because we were using SQL Server Machine Learning Services, we needed to store those custom models in the database. But each model, even when compressed, could run in the kilobytes to megabytes in size. We only needed to retrieve the model during training or inference, not reporting, but we did have reports that tracked what kind of model we were using, whether it was a standard model or custom (and if custom, what algorithm we used). Thus, we had the model binary in its own table, separate from the remaining model data.

Leave a Comment

Sales KPIs in T-SQL

I have a new video:

In this video, we will take a look at several useful KPIs for measuring sales. We will also demonstrate the utility of the APPLY operator, the WINDOW clause, and the APPROX_PERCENTILE_CONT() function.

I enjoyed putting this one together, because I enjoy it any time I have a chance to talk about the APPLY operator.

Leave a Comment

Capturing Stored Procedure Executions via Extended Events

Haripriya Naidu creates an Extended Events session:

If you’re a DBA, how many times have developers asked you to check whether a particular stored procedure is hitting SQL Server?

Yes, we’ve all been there. Typically, you either run Profiler or set up an Extended Events session to trace it. But the question is, have you ever been confused about which event action to use?

Read on for the list of candidates and when you should use each.

I do mildly disagree with Haripriya’s last note around running Extended Events in production. There’s a small performance hit for most events. There are a few events that do have a much stronger impact, but for the large majority of events, you can (and probably should) have them running to provide the relevant diagnostic information. This isn’t like Profiler or even a server-side trace, where there’s a fairly significant overhead cost.

Leave a Comment

Microsoft Fabric Quotas

Mihir Wagle puts the kibosh on things:

On February 24, 2025, we launched Microsoft Fabric Quotas, a new feature designed to control resource governance for the acquisition of your Microsoft Fabric capacities. Fabric quotas aimed at helping customers ensure that Fabric resources are used efficiently and help manage the overall performance and reliability of the Azure platform while preventing misuse.

Note that these are not quotas you set on your users, but rather quotas that Microsoft sets on you.

Leave a Comment

Maxing Out on Stored Procedure Parameters

Louis Davidson tries it out:

So I replied: “Challenge Accepted”, well, actually I replied with a gif of Neo saying it, but the effect was the same. So, I decided to just see, what would that look like. Coincidentally I am testing the new template for the Simple Talk site, and a function with 2100 parameters seemed like some code that screams out: “BIG!” Testing is should always be about pressing the limits of your code, so why not.

It turns out that you cannot have 2101 parameters in a single stored procedure definition. But this is definitely an example of Swart’s 10% Rule. It also nets Louis my most coveted category: Wacky Ideas.

Leave a Comment

Building a RegEx Emulator in SQL Server

Sebastiao Pereira offers a fourth-best solution:

Regular expressions (REGEX) let you adaptively investigate, employ, and authenticate text data. This makes it easy to search for simple or complex string patters. There is no direct way to do this in SQL Server, but in this article we look at some SQL functions you can create to emulate regex like functionality.

Regular expressions are coming to SQL Server 2025 and are in Azure SQL Database, so that’s the best option when it becomes available. The second-best option is to use CLR and offload your regular expressions work to .NET, especially if you’re using a library like SQLSharp to do so. The third-best option would be to do this in Python or R with ML Services, though that’s going to be a bit of setup effort and will probably be somewhat limiting. And if all else fails, this is an admirable fallback.

Leave a Comment