Press "Enter" to skip to content

Author: Kevin Feasel

Vertical Partitioning Rarely Works

Brent Ozar lays out an argument:

You’re looking at a wide table with 100-200 columns.

Years ago, it started as a “normal” table with maybe 10-20, but over the years, people kept gradually adding one column after another. Now, this behemoth is causing you problems because:

  • The table’s size on disk is huge
  • Queries are slow, especially when they do table scans
  • People are still asking for more columns, and you feel guilty saying yes

You’ve started to think about vertical partitioning: splitting the table up into one table with the commonly used columns, and another table with the rarely used columns. You figure you’ll only join to the rarely-used table when you need data from it.

Read on to understand why this is rarely a good idea and what you can do instead.

I will say that I’ve had success with vertical partitioning in very specific circumstances:

  1. There are large columns, like blobs of JSON, binary data, or very large text strings.
  2. There exists a subset of columns the application (or caller) rarely needs.
  3. Those large columns are in the subset of columns the caller rarely needs, or can access via point lookup.

For a concrete example, my team at a prior company worked on a product that performed demand forecasting on approximately 10 million products across the customer base. For each product, we had the choice between using a common model (if the sales fit a common pattern) or generating a unique model. Because we were using SQL Server Machine Learning Services, we needed to store those custom models in the database. But each model, even when compressed, could run in the kilobytes to megabytes in size. We only needed to retrieve the model during training or inference, not reporting, but we did have reports that tracked what kind of model we were using, whether it was a standard model or custom (and if custom, what algorithm we used). Thus, we had the model binary in its own table, separate from the remaining model data.

Leave a Comment

Sales KPIs in T-SQL

I have a new video:

In this video, we will take a look at several useful KPIs for measuring sales. We will also demonstrate the utility of the APPLY operator, the WINDOW clause, and the APPROX_PERCENTILE_CONT() function.

I enjoyed putting this one together, because I enjoy it any time I have a chance to talk about the APPLY operator.

Leave a Comment

Capturing Stored Procedure Executions via Extended Events

Haripriya Naidu creates an Extended Events session:

If you’re a DBA, how many times have developers asked you to check whether a particular stored procedure is hitting SQL Server?

Yes, we’ve all been there. Typically, you either run Profiler or set up an Extended Events session to trace it. But the question is, have you ever been confused about which event action to use?

Read on for the list of candidates and when you should use each.

I do mildly disagree with Haripriya’s last note around running Extended Events in production. There’s a small performance hit for most events. There are a few events that do have a much stronger impact, but for the large majority of events, you can (and probably should) have them running to provide the relevant diagnostic information. This isn’t like Profiler or even a server-side trace, where there’s a fairly significant overhead cost.

Leave a Comment

Microsoft Fabric Quotas

Mihir Wagle puts the kibosh on things:

On February 24, 2025, we launched Microsoft Fabric Quotas, a new feature designed to control resource governance for the acquisition of your Microsoft Fabric capacities. Fabric quotas aimed at helping customers ensure that Fabric resources are used efficiently and help manage the overall performance and reliability of the Azure platform while preventing misuse.

Note that these are not quotas you set on your users, but rather quotas that Microsoft sets on you.

Leave a Comment

Maxing Out on Stored Procedure Parameters

Louis Davidson tries it out:

So I replied: “Challenge Accepted”, well, actually I replied with a gif of Neo saying it, but the effect was the same. So, I decided to just see, what would that look like. Coincidentally I am testing the new template for the Simple Talk site, and a function with 2100 parameters seemed like some code that screams out: “BIG!” Testing is should always be about pressing the limits of your code, so why not.

It turns out that you cannot have 2101 parameters in a single stored procedure definition. But this is definitely an example of Swart’s 10% Rule. It also nets Louis my most coveted category: Wacky Ideas.

Leave a Comment

Building a RegEx Emulator in SQL Server

Sebastiao Pereira offers a fourth-best solution:

Regular expressions (REGEX) let you adaptively investigate, employ, and authenticate text data. This makes it easy to search for simple or complex string patters. There is no direct way to do this in SQL Server, but in this article we look at some SQL functions you can create to emulate regex like functionality.

Regular expressions are coming to SQL Server 2025 and are in Azure SQL Database, so that’s the best option when it becomes available. The second-best option is to use CLR and offload your regular expressions work to .NET, especially if you’re using a library like SQLSharp to do so. The third-best option would be to do this in Python or R with ML Services, though that’s going to be a bit of setup effort and will probably be somewhat limiting. And if all else fails, this is an admirable fallback.

Leave a Comment

Working with JSON_OBJECTAGG() and JSON_ARRAYAGG() in Azure SQL

Koen Verbeek tries out a couple of fairly new functions:

I need to construct JSON from data in our database, but I find the existing FOR JSON PATH limited when the data is not located in one single row but rather scattered over multiple rows. Is there another method on how to handle JSON data in SQL Server? Learn how to use the new SQL Server JSON Functions JSON_OBJECTAGG and JSON_ARRAYAGG in this article.

These aren’t available on-premises yet, though given that there’s a new version of SQL Server coming out in 2025, there’s a good chance we’ll have it by then.

Leave a Comment

Comparing Oracle and PostgreSQL Transaction Systems

Laurenz Albe performs a comparison:

The transaction system is a core component of a relational database. It provides services that facilitate developing applications that guarantee data integrity. The SQL standard regulates some of the features of database transactions, but leaves many details unspecified. As a consequence, the transaction systems of relational databases can differ substantially. These days, many people are trying to get away from Oracle database and move to PostgreSQL. To migrate an application from Oracle to PostgreSQL, it is crucial to understand the differences between the respective transaction systems. Otherwise, you can end up with nasty surprises that may jeopardize performance and data integrity. So I thought it would be useful to compile a comparison of the features of transactions in Oracle and PostgreSQL.

Read on for that dive into how the two products compare.

Leave a Comment

Power BI Semantic Model Monthly Refresh via Fabric Data Pipelines

Chris Webb has another way for scheduling refreshes:

I’m sure you already know how to configure scheduled refresh for your semantic models in Power BI. While the options you have for controlling when refresh takes place are generally good enough – you can configure daily or weekly refreshes and set up to eight times a day for refreshes to take place – there are some scenarios it doesn’t work for, such as monthly refreshes. Up to now the workaround has been to use Power Automate to trigger refreshes (see here for an example) or to call the refresh API from another application. Now, with Fabric, you have a much better option for scheduling refreshes: Data Pipelines.

Click through for the demonstration.

Leave a Comment

Aging Accounts Receivable Data in Power BI

Marco Russo and Alberto Ferrari inform us that we owe them money:

The Accounts Receivable (AR) Aging report helps companies track overdue receivables and better manage their cash flow. It is a common requirement from the finance department that often represents a challenge for Power BI reports, especially when you want to show the trend over time. Here is a typical visualization for the Open Amount at the end of each month, colored by the age range of the Accounts Receivable.

Read on for a dive into the report concept and some of the most common issues you may run into.

Leave a Comment