Press "Enter" to skip to content

Day: February 22, 2021

Tidying the Confusion Matrix in R

Gary Hutson has a new package for us:

The package aim is to make it easier to convert the outputs of the lists from caret and collapse these down into row-by-row entries, specifically designed for storing the outputs in a database or row by row data frame.

This is something that the CARET library does not have as a default and I have designed this to allow the confusion matrix outputs to be stored in a data frame or database, as many a time we want to track the ML outputs and fits over time to monitor feature slippage and changes in the underlying patterns of the data.

I like the way caret shows the confusion matrix when I’m reviewing result on my own, but I definitely appreciate efforts to make it easier to handle within code—similar to how broom reads linear regression outputs. H/T R-bloggers

Comments closed

Fixing Those Pesky Wait Stats

David Alcock keeps us from having to think:

CXPACKET

Attempt to pronounce parallellellellellism correctly then set MAXDOP to 1.

PAGELATCH

Mention TempDB and contention in the same sentence. Delete TempDB

BACKUP_

Delete any long running backup jobs. If wait persists then delete all backup jobs.

Click through for plenty of excellent nuggets of advice which definitely won’t land you on the unemployment line.

Comments closed

The Joy of Parameter Sniffing

Erik Darling points out that parameter sniffing is generally a good thing:

To some degree, I get it. You’re afraid of incurring some new performance problem.

You’ve had the same mediocre performance for years, and you don’t wanna make something worse.

The thing is, you could be making things a lot better most of the time.

We should specifically talk about parameter sniffing problems rather than parameter sniffing as a problem. These sorts of problems are closer to the exception than the rule.

Comments closed

Disorderly Queries

Chad Callihan wants you to think about that ORDER BY clause:

I recently came across a scenario where an application process was not performing correctly on one database but was working fine on others. The process should have been completing in seconds but was taking minutes with no indication of activity. After some investigation, I found that the process was stuck waiting on a SELECT statement to complete. Even worse, it was holding an exclusive lock on a table which was then blocking new information from processing.

One part of the SELECT query that stood out was that it was ordering by a date field. Considering what the process was doing, there was no need to have the data ordered. Generally, it’s better to have the data sorted on the application side instead of SQL Server but in this case not even that was necessary.

There are definitely good cases where you need to use ORDER BY in a database—such as paging scenarios. But if you don’t need things in a particular order, Chad shows that you can potentially save a good deal on performance without an explicit ordering.

Comments closed

Fun with Multi-Column Unique Constraints

Aaron Bertrand has an interesting use case:

A problem that comes up occasionally with constraints in SQL Server is when a unique constraint applies to multiple columns, but the values in those columns can be populated in any order. For example, if a table holds interactions between two users, and has columns User1 and User2, there should only be one row allowed for users 100 and 200, regardless of whether the data is entered as 100, 200 or 200, 100.

Click through for one solution. Another solution would be to normalize this down further with a dbo.ConversationParticipants table.

Comments closed