Press "Enter" to skip to content

Day: January 7, 2025

Calculating Inter-Quartile Range and Z Score in T-SQL

Sebastiao Pereira hunts for outliers:

Outliers can significantly distort statistical analysis and lead to incorrect conclusions when interpreting data. In this article, we will look at how to find outliers in SQL Server using various T-SQL queries. Understanding how to find outliers in SQL is crucial for accurate data analysis.

Sebastiao uses PERCENTILE_CONT() in this demonstration. That works fine for relatively small tables, though it does not scale well at all. Once you’re in the millions of records, it gets slow. From there, my joke is that, if you have 100 million or more records, you can start a query with PERCENTILE_CONT() on one instance. Meanwhile, on a separate instance, as soon as you kick off that query, go install SQL Server ML Services, configure it, check out a tutorial on R or Python, figure out how you can calculate the inter-quartile range in that language, learn how ML Services works, and you’ll still get the answer before your first query finishes.

If you’re using SQL Server 2022, there is a new APPROX_PERCENTILE_CONT() that is orders of magnitude faster as you get increasingly large datasets. It’s also accurate to within 1.33% (on each side of the correct answer) within a 99% confidence. The way the query works is a bit different, though, because the approximation is a nested set function using a WITHIN GROUP() clause, whereas PERCENTILE_CONT() is a window function that uses an OVER() clause. That means it’s not quite as easy as slapping “APPROX_” to the start of the query, but because Sebastiao uses WITHIN GROUP in the T-SQL, it’s pretty close: PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY [ObsValue]) OVER() AS Q1 becomes APPROX_PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY [ObsValue]) AS Q1 or something like that–I’m compiling in the browser here.

Leave a Comment

Reading Data from Deleted Columns

Michael J. Swart checks the typewriter ribbon:

It’s hard to destroy data. Even when a column is dropped, the data is still physically there on the data page. We can use the undocumented/unsupported command DBCC PAGE to look at it.

This is tied in with how we can drop a column in SQL Server and have it not take a very long time: because when we drop the column, we’re just modifying the metadata associated with the table and telling SQL Server to ignore this bit here. Do read the whole thing, and also check out a fun comment from Paul White.

Leave a Comment

Setting RCSI on a Database in an Availability Group

Haripriya Naidu makes a change:

I was working on modifying isolation level of database from Read Committed to Read Committed Snapshot(RCSI) and had to get exclusive access to the database. After letting the application team know about it and having stopped their processes, I tried to set the database to SINGLE_USER but it errored out.

It turns out that you cannot set a database to single user mode if it is in an availability group or part of database mirroring. Nonetheless, there is still a way to make this change. Read on to learn more.

Leave a Comment

Sending E-mails in SSRS over Ports other than 25

Eitan Blumin fixes an error:

After configuring an Email subscription, the subscription result shows: “Failure sending mail: One or more errors occurred.“. In this blog post I will share how I investigated and resolved one such failure.

My first step in troubleshooting this error was to query from the ExecutionLog3 view inside the ReportServer database. I normally do this to check if and why a report subscription has failed its run.

However, in this case all the log records showed success (rsSuccess status):

Read on to see what this indicates, how Eitan was able to troubleshoot the problem, and the ultimate fix.

Leave a Comment