Press "Enter" to skip to content

Author: Kevin Feasel

Comparing Varieties of Statistics in SQL Server

Kendra Little gets the smorgasbord:

Statistics in SQL Server are simple in theory: they help the optimizer estimate how many rows a query might return.

In practice? Things get weird fast. Especially when you start filtering on multiple columns, or wondering why the optimizer thinks millions of rows are coming back when you know it’s more like a few hundred thousand.

In this post, I’ll walk through examples using single-column, multi-column, and filtered statistics—and show where estimates go off the rails, when they get back on track, and why that doesn’t always mean you need to update everything with FULLSCAN.

Read on for a review of the three types of statistics. Admittedly, I’ve never had much luck with filtered statistics improving the performance of queries. If I were to speculate, I’d say that they’re good for a very specific type of problem that maybe I just don’t run into that often.

Comments closed

Purview DLP Updates

Yael Biss has an announcement:

Microsoft Purview’s Data Loss Prevention (DLP) policies for Fabric now supports Fabric KQL and Mirrored DBs!

Purview DLP policies help organizations to improve their data security posture and comply with governmental and industry regulations. Security teams use DLP policies to automatically detect upload of sensitive information to Microsoft 365 applications like SharePoint and Exchange, and to Fabric’s semantic models and lakehouses.

And another one:

In today’s fast-paced data-driven world, enterprises are building more sophisticated data platforms to gain insights and drive innovation. Microsoft Fabric Lakehouses combine the scale of a data lake with the management finesse of a data warehouse – delivering unified analytics in an ever-evolving business landscape. But with great data comes great responsibility. Protecting sensitive information and ensuring regulatory compliance is paramount. That’s where Data Loss Prevention (DLP) policies with restricted access come into play.

Click through to see what this preview currently offers.

Comments closed

Swap-and-Drop for Partition Management

Rich Benner deals with a troublesome partition:

What are stubborn partitions in SQL Server and how do you delete them? This was an interesting issue I recently had to deal with on a client site that I thought our readers might find interesting.

The tables in use here are partitioned. The partition field is based upon a date field and we have a partition per month. There is a monthly maintenance job which creates our new partitions. The job should also delete the oldest partitions. This job has been failing to delete an old partition as the data file contained within is not empty. It’s stubborn!

If we try to remove this file we get the error “The File cannot be removed because it is not empty,” as you can see:

Read on for some diagnosis of the problem, as well as the solution Rich developed.

Comments closed

A Required Privilege Is Not Held by the Client

Rebecca Lewis runs into a permissions error:

I received an email from a customer yesterday regarding their Replication, which began failing with this error after Windows updates were applied:

Message Replication-Replication Transaction-Log Reader Subsystem: agent servername-xxx2 failed. Executed as user: domainname\svcaccount. A required privilege is not held by the client. The step failed.

Slightly dummied, but the important content is in red.  What does that mean?  ‘A required privilege is not held by the client’… he didn’t change anything, I didn’t change anything – why is Replication suddenly failing with permissions problems?

Click through for the answer.

Comments closed

Using fabric-cicd with GitHub Actions

Kevin Chant doesn’t limit us to Azure DevOps:

In this post I want to show how you can operationalize fabric-cicd to work with Microsoft Fabric and GitHub Actions. Since I got asked if this post was available whilst I was helping at the ask the experts panel during the Microsoft Fabric Community Conference.

Just so that everybody is aware, fabric-cicd is a Python library that allows you to perform CI/CD of various Microsoft Fabric items into Microsoft Fabric workspaces. At this moment in time there is a limited number of supported item types. However, that list is increasing.

Click through for a high-level diagram and the process, including the code Kevin used in the GitHub Actions workflow.

Comments closed

Data Conversion via Generative AI

Grant Fritchey rearranges some data:

The DM-32 is a Digital Mobile Radio (DMR) as well as an analog radio. You can follow the link to understand all that DMR represents when talking radios. I want to focus on the fact that you have to program the behaviors into a DMR radio. While the end result is identical for every DMR radio, how you get there, the programming software, is radically different for every single radio (unless you get a radio that supports open source OpenGD77, yeah, playing radio involves open source as well). Which means, if I have more than one DMR radio (I’m currently at 7, and no, I don’t have a problem, shut up) I have more than one Customer Programming Software (CPS) that is completely different from other CPS formats. Now, I like to set up my radios similarly. After all, the local repeaters, my hotspot, and the Talkgroups I want to use are all common. Since every CPS is different, you can’t just export from one and import to the next. However, I had the idea of using AI for data conversion. Let’s see how that works.

Click through for the scenario as well as Grant’s results. Grant’s results were pretty successful for a data mapping operation, though choice of model and simplicity of the input and output examples are important to generate the Python code.

Comments closed

Data Splitting and Cross-Validation in R

Nick Han has a pair of articles. First up is on data splitting and pre-processing:

Data preprocessing is a crucial step in any machine learning workflow. It ensures that your data is clean, consistent, and ready for modeling. In this blog post, we’ll walk through the process of splitting and preprocessing data in R, using the rsample package for data splitting and saving the results for future use.

H/T R-Bloggers for that one.

The second involves using cross-validation via the caret package in R:

Cross-validation is a resampling technique used to assess the performance and generalizability of machine learning models. It helps address issues like overfitting and ensures that the model’s performance is consistent across different subsets of the data. By splitting the data into multiple folds and repeating the process, cross-validation provides a robust estimate of model performance.

H/T R-Bloggers for that as well.

Comments closed

An Introduction to Temporal Tables

Stephen Planck covers a feature in SQL Server:

Have you ever wanted to see exactly how a row looked in your SQL Server database a week or a month ago—without writing complicated auditing code or triggers? With temporal tables (also known as system-versioned tables), you can do precisely that. Temporal tables automatically track historical data changes by maintaining multiple versions of each row in a separate history table. Let’s explore how they work, why you might use them, how to set them up, and what best practices to follow.

Click through for a good overview of the feature. I have mixed feelings on the feature because I think it’s halfway-finished and has been since 2016. But there are two things that I think the feature really needs to shine.

The first is user-defined versioning. SQL Server only offers system time for version tracking, meaning it keeps track of when you insert the row into the table. That’s fine for certain historical operations, but terrible if you want to use temporal tables for type-2 slowly changing dimensions, where you often care more about when a record became effective rather than when you inserted the row into the dimension.

The second is efficient historical slicing the same way you can do AS OF operations. AS OF lets you ask questions about what the data looked like at a specific point in time. For warehousing scenarios, we also want to look at the history of changes for a particular keyed record, so you might see all of the changes to a customer or an employee. You can do this with a UNION ALL operation, but that query logic can get complex.

Comments closed

Estimating SQL Server Backup Sizes

Rebecca Lewis gives us a rule of thumb:

How big will my backup file be?  Very good question.  And of course, it depends… especially if you’re using compression.

I was working something today that required a backup beforehand for safety measures, and the server had limited storage.  So, I needed to estimate the backup file size beforehand to confirm whether I had enough space for the bak file.

Click through for the script. It does, of course, include some simplifications and your actual numbers can turn out a bit different, but for a quick and dirty estimate of disk space needed, this isn’t bad at all.

Comments closed

Creating a Microsoft Fabric Capacity

Boniface Muchendu builds out some capacity:

To begin, we need to head over to the Azure portal. You might wonder why we are starting here. Well, Microsoft Fabric is now an Azure resource, which means all initial setups must be done in the Azure environment.

Click through for step-by-step instructions. Microsoft has also been really good about letting people create (and re-create and re-create) trial capacities, so if you’re just futzing about with the product to get an idea of what it can do, see if you can use that rather than shelling out the cash.

Comments closed