Press "Enter" to skip to content

Author: Kevin Feasel

Idle PostgreSQL Transactions and Table Bloat

Umair Shahid notes that some tables are feeling a bit bloated:

Yup, you read it right. Idle transactions can cause massive table bloat that the vacuum process may not be able to address. Bloat causes degradation in performance and can keep encroaching disk space with dead tuples. 

This blog delves into how idle transactions cause table bloat, why this is problematic, and practical strategies to avoid it.

Read on to understand how this can be and what you can do about it. And do check out the comments for a quick explanation of why connection pooling doesn’t exhibit this same problem.

Comments closed

Prevalence Adjustment in Binary Classifiers

David Lindelöf deal with an issue in analyzing classification models:

When you run a binary classifier over a population you get an estimate of the proportion of true positives in that population. This is known as the prevalence.

But that estimate is biased, because no classifier is perfect. 

Read on to learn what this means for precision, as well as one technique for tracking prevalence changes over itme.

Comments closed

Entity Framework and Default Data Lengths

Brent Ozar points out one issue you might run into when using Entity Framework:

Most of the time, I love Entity Framework, and ORMs in general. These tools make it easier for companies to ship applications. Are the apps perfect? Of course not – but they’re good enough to get to market, bring in revenue to pay salaries, and move a company forwards.

However, just like any tool, if you don’t know how to use it, you’re gonna get hurt.

One classic example popped up again last month with a client who’d used EF Core to design their database for them. The developers just had to say which columns were numbers, dates, or strings, and EF Core handled the rest.

Read on for the scenario.

Comments closed

Window Function Ranges: UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING

Chad Callihan engages the limit breaker:

I’m familiar with using the OVER clause and don’t think it’s too uncommon to see it used for including row numbers by using ROW_NUMBER() and aggregating data. But even though they’ve been around since SQL Server 2012, I’m not too familiar with using the OVER clause with the UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING to affect the window being queried.

Let’s take a look at a couple of examples using UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING.

Click through for those examples. The default ranges for window functions usually make a lot of sense, but it’s good to understand your options for frames: ROWS vs RANGE, as well as the frame values (UNBOUNDED PRECEDING, {N} PRECEDING, CURRENT ROW, {N} FOLLOWING, and UNBOUNDED FOLLOWING).

Comments closed

Reading a SQL Server XML Deadlock Report

Stephen Planck reads a report:

SQL Server includes an Extended Events session called system_health, which runs by default and, among other things, captures information about deadlocks as they occur. When two or more sessions block each other in such a way that no progress can be made (a deadlock), SQL Server chooses one session as the “victim,” rolls back its transaction, and frees resources so other sessions can continue. By reviewing the deadlock report in the system_health session’s XML output, you can see precisely why the deadlock happened and identify which queries or procedures were involved.

Below is a walkthrough of how to interpret a sample XML deadlock report, followed by a brief note on how to access this output.

Read on for that walkthrough.

Comments closed

Building a QR Code Clock

Tomaz Kastrun checks what time it is:

Ever wanted to have a clock on the wall or in the office, that is not binary. But it is QR-Code clock. Well, now you can have it.

This useless R function generates new QR Code for every given period and tells the time.

Click through for the code. I could see this being useful in scenarios where you want to avoid people copying the QR code, so you embed the time in there. Then, your reader service can check to see if the time is within some valid boundary, returning an error if not.

Comments closed

The Power of One Data Point

I have a new video:

In this video, I demonstrate how much information we can gain from one sample of a distribution.

Some aspect of this is “that’s a neat parlor trick” but it does speak to the marginal information gain of a small amount of data.

Comments closed

The Use Case for Memory-Optimized tempdb Metadata

Ben Johnston takes us through a scenario:

I recently had an interesting production SQL Server issue that turned out to be very easy to fix. The fix doesn’t fit every workload or server, but for the limited use cases described below, it’s a simple configuration change. The general best practice for server level configurations is to leave things at default, unless there is a specific reason for changing the settings. This will outline a use case for using memory-optimized tempdb metadata.

This covers a very specific scenario and workload. You won’t use this on a typical server, which is why it isn’t a default setting. You will see this for very specific server workloads, those with many transactions and high temp table usage. Most systems can better use the memory for the regular workload instead of optimizing tempdb metadata, so don’t use this as a default setting for your servers.

Click through for the scenario.

Comments closed