Press "Enter" to skip to content

Curated SQL Posts

FGCB_ADD_REMOVE Latch

Paul Randal looks at a particular latch:

Most latch class names are tied directly to the data structure that they protect. The FGCB_ADD_REMOVE latch protects a data structure called an FGCB, or File Group Control Block, and there will be one of these latches for each online filegroup of each online database in a SQL Server instance. Whenever a file in a filegroup is added, dropped, grown, or shrunk, the latch must be acquired in EX mode, and when figuring out the next file to allocate from, the latch must be acquired in SH mode to prevent any filegroup changes. (Remember that extent allocations for a filegroup are performed on a round-robin basis through the files in the filegroup, and also take into account proportional fill, which I explain here.)

Read on to understand what can cause this particular latch to become a bottleneck in your system.

Comments closed

Returning 0 Instead of BLANK in DAX

Marco Russo and Alberto Ferrari want to see zeroes in specific circumstances:

What makes this specific product interesting is that the product had sales in 2007, no sales in 2008 and it started selling again in 2009. Its behavior is different than the other products. Indeed, for most of these products one can argue that they start to produce sales when they were introduced in the market. Their behavior is quite intuitive: no sales up to a given point in time, then they start selling. We want to highlight this specific product because it shows a gap in sales when it was already present on the market. For other products, we are happy to blank them until their first sale. By doing this, we show gaps when they are real, and we avoid showing non-relevant information, that is products that could not produce sales because they were not even available to sell.

Read on to see how they do this.

Comments closed

Planning a Restore Strategy

Jonathan Kehayias inverts the paradigm:

Do you know how long it takes to RESTORE each database from a FULL backup? Does your run book include critical configuration options like Instant File Initialization or backup compression that can impact how long it takes to restore a database? Does the database have an excessively large transaction log file that will require zero initialization upon creation if the existing database files are not there? Does the database use transparent database encryption (TDE) which will prevent it from being able to instant initialize the size of the data files for the database?

Until you test your RESTORE time for a FULL backup, you don’t know if you can meet your RTO utilizing backups or not, and you may need one of the previously mentioned HA/DR features in SQL Server. However, in an absolute worst case scenario where recovery can only occur from backups, it is still important to know how long each step of the process is going to take so that you can set expectations appropriately. Your minimum recovery time is going to at least be the time required to restore the most recent FULL backup.

Click through for a lot of great advice in this vein.

Comments closed

Exporting Large Tables from Power BI Desktop

Nick Edwards shows us the way:

Have you ever wanted to export a table from Power BI Desktop into Excel just to make sure the DAX you’ve written is performing as expected but ran into this error message: “Data exceeds the limit – Your data is too large. Some data sampling may occur”?

Most probably this has occurred because you’ve got more than 30,000 rows of data in your table that you’re trying to export. In the example shown below I’ve actually got 30,001 rows of data – a row containing column headers plus 30,000 rows of actual data.

Click through to see how you can get around this limitation.

Comments closed

Declarative MLOps with Ludwig

Jacqueline Cardoso announces a new version of Ludwig:

Ludwig abstracts away the complexity of combining all these disparate systems together through its declarative approach to structuring machine learning pipelines. Instead of writing code for your model, training loop, preprocessing, postprocessing, evaluation, and hyperparameter optimization, you only need to declare the schema of your data as a simple YAML configuration:

I’ve long been a fan of declarative approaches to problem-solving, so I’m going to need to dig into this a bit.

Comments closed

Decile Analysis and Logistic Regression

Ridhima Kumar (re-)introduces us to decile analysis:

Decile analysis was once a popularly used technique, however the convention of teaching and bucketing machine learning problems into either ‘classification’ or ‘Regression’ types, lead people to forget Decile analysis type analyses. I am pretty sure, most freshly minted data scientists would not have even heard of Decile analysis. So, coming back to what is Decile Analysis.

Decile Analysis is used to categorize dataset from highest to lowest values or vice versa. (Based on predicted probabilities)

As obvious from the name, the analysis involves dividing the dataset into ten equal groups. Each group should have the same no. of observations/customers.

It ranks customers in the order from most likely to respond to least likely to respond.

Read on to learn the steps and how this ties with the fact that logistic regression is regression.

Comments closed

When the Version Store Fills tempdb

David Fowler takes us through a mental exercise:

Well, there is something else that I’ve seen have a habit of filling TempDB. If you’re using Read Committed Snapshot or RCSI then you’ll also have a version store in your database. I’m not going to go into the details of exactly how version store works, there is plenty of documentation on it out there if you’re interested (perhaps I’ll write a post on it sometime).

The key bit that you need to know is that although SQL will keep version store trimmed down and only keep the rows that are needed, it can only clear rows that are older than the oldest transaction. This is because SQL has no way of knowing what rows are going to be needed by the transaction. Do you see a potential issue here?

Read on for enlightenment.

Comments closed

TDE and Backup Compression

Andy Levy learns the truth:

For years, I thought that native backups of databases using Transparent Data Encryption (TDE) couldn’t be compressed. Between TDE being limited to Enterprise Edition until SQL Server 2019 and my own lack of experience with TDE in prior positions, I hadn’t really experimented with this myself. Some people have even gone so far as to skip compression in their backup jobs for TDE-enabled databases because there’s no need to burn those CPU cycles if you won’t get any compression, right?

But a curious thing happened after I upgraded a portion of my environment to SQL Server 2019 in late 2020. I observed that scheduled backups were compressing for some of my TDE-enabled databases, most notably the newer instances. And when I took ad hoc backups in any environment, they were compressed. So why wasn’t it working everywhere?

Read on for the explanation, though one correction: MAXTRANSFERSIZE is 1MB by default only when the database is not encrypted using TDE (and you aren’t backing up to a tape drive). If the database is encrypted using TDE, the default max transfer size is 64KB, and I think that’s what got Andy.

1 Comment

Bug with Filtered Index on Computed Column

Erik Darling points out a weird bug:

At some point in the past, I blogged about a silent bug with computed columns and clustered column store indexes.

In this post, I’m going to take a quick look at a very loud bug.

Normally, you can’t add a filtered index to a computed column. I’ve always hated that limitation. How nice would that be for so many currently difficult tasks?

Click through to see how you can create a filtered index against a computed column, as well as all of the pain it provides.

Comments closed