2025-01-09 – Curated SQL

Prevalence Adjustment in Binary Classifiers

Published 2025-01-09 by Kevin Feasel

David Lindelöf deal with an issue in analyzing classification models:

When you run a binary classifier over a population you get an estimate of the proportion of true positives in that population. This is known as the prevalence.

But that estimate is biased, because no classifier is perfect.

Read on to learn what this means for precision, as well as one technique for tracking prevalence changes over itme.

Comments closed

Entity Framework and Default Data Lengths

Published 2025-01-09 by Kevin Feasel

Brent Ozar points out one issue you might run into when using Entity Framework:

Most of the time, I love Entity Framework, and ORMs in general. These tools make it easier for companies to ship applications. Are the apps perfect? Of course not – but they’re good enough to get to market, bring in revenue to pay salaries, and move a company forwards.

However, just like any tool, if you don’t know how to use it, you’re gonna get hurt.

One classic example popped up again last month with a client who’d used EF Core to design their database for them. The developers just had to say which columns were numbers, dates, or strings, and EF Core handled the rest.

Read on for the scenario.

Comments closed

Methods to Copy On-Premises SQL Server Data into Microsoft Fabric

Published 2025-01-09 by Kevin Feasel

Gilbert Quevauvilliers runs a test:

In this blog post I am going to determine which item workload uses the least amount of Capacity Units when copying the same data from an On-Premises SQL Server.

The item workloads that I can use to copy data are Dataflow Gen1, Dataflow Gen2 and Pipelines.

Read on for the results, as well as one caveat about them.

Comments closed

Window Function Ranges: UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING

Published 2025-01-09 by Kevin Feasel

Chad Callihan engages the limit breaker:

I’m familiar with using the OVER clause and don’t think it’s too uncommon to see it used for including row numbers by using ROW_NUMBER() and aggregating data. But even though they’ve been around since SQL Server 2012, I’m not too familiar with using the OVER clause with the UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING to affect the window being queried.

Let’s take a look at a couple of examples using UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING.

Click through for those examples. The default ranges for window functions usually make a lot of sense, but it’s good to understand your options for frames: ROWS vs RANGE, as well as the frame values (UNBOUNDED PRECEDING, {N} PRECEDING, CURRENT ROW, {N} FOLLOWING, and UNBOUNDED FOLLOWING).

Comments closed

Reading a SQL Server XML Deadlock Report

Published 2025-01-09 by Kevin Feasel

Stephen Planck reads a report:

SQL Server includes an Extended Events session called system_health, which runs by default and, among other things, captures information about deadlocks as they occur. When two or more sessions block each other in such a way that no progress can be made (a deadlock), SQL Server chooses one session as the “victim,” rolls back its transaction, and frees resources so other sessions can continue. By reviewing the deadlock report in the system_health session’s XML output, you can see precisely why the deadlock happened and identify which queries or procedures were involved.

Below is a walkthrough of how to interpret a sample XML deadlock report, followed by a brief note on how to access this output.

Read on for that walkthrough.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Day: January 9, 2025

Prevalence Adjustment in Binary Classifiers

Entity Framework and Default Data Lengths

Methods to Copy On-Premises SQL Server Data into Microsoft Fabric

Window Function Ranges: UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING

Reading a SQL Server XML Deadlock Report