2024-11-25 – Curated SQL

Generating Effect Plots in Python and R

Published 2024-11-25 by Kevin Feasel

The plots show different types of feature effects relevant in modeling:

Average observed: Descriptive effect (also interesting without model).

Average predicted: Combined effect of all features. Also called “M Plot” (Apley 2020).

Partial dependence: Effect of one feature, keeping other feature values constant (Friedman 2001).

Number of observations or sum of case weights: Feature value distribution.

R only: Accumulated local effects, an alternative to partial dependence (Apley 2020).

Click through to see how they both work.

Comments closed

Finding Free Space in Azure Database for PostgreSQL

Published 2024-11-25 by Kevin Feasel

Josephine Bush needs some space:

I wanted to figure out how big (or approximately how big) my dump file would be. In QA, I have (relatively) large dbs (compared to prod). Enter the pgstattuple extension to help me determine how much free space is in my tables.

Read on for installation notes and how to use the extension.

Comments closed

New Syntax in SQL Server 2022

Published 2024-11-25 by Kevin Feasel

Steve Jones talks syntax:

At the recent 2024 PASS Data Community Summit, I saw a lightning talk from Mikey Bronowski on the New T-SQL Functions in SQL Server 2022. Before the talk, I made a joke with him that none of these were new because it’s 2024. They’d been out for 2 years.

Mikey did a nice job, given that he only had 10 minutes, but it was enough to give attendees an idea of some new things they might investigate to use in their own code. If you missed the talk, or you don’t have access to the recordings, we have a series on SQL Server Central that covers these (Part 1, Part 2, Part 3) and helps you understand the new options. The list of language changes is also in the MS Docs under What’s New in the Language.

Steve asks about how much utilization there is of these. I think the problem is that so few organizations have adopted SQL Server 2022 that knowledge that these things even exist is quite low.

By the way, I do have a script covering many of these new bits of syntax as part of a talk on getting up to speed with core T-SQL changes since SQL Server 2012.

Comments closed

The Difficulty of Deletion in PostgreSQL

Published 2024-11-25 by Kevin Feasel

Radim Marek takes us through the process of deleting rows:

Your database is ticking along nicely – until a simple DELETE brings it to its knees. What went wrong? While we tend to focus on optimizing SELECT and INSERT operations, we often overlook the hidden complexities of DELETE. Yet, removing unnecessary data is just as critical. Outdated or irrelevant data can bloat your database, degrade performance, and make maintenance a nightmare. Worse, retaining some types of data without valid justification might even lead to compliance issues.

Read on to learn about the process, some challenges, and a common pattern for resolving these challenges. The solution is pretty similar in SQL Server as well: batching delete operations, ideally with a supporting index.

Comments closed

AWS DMS and a LOB Bug

Published 2024-11-25 by Kevin Feasel

Richard O’Riordan fixes an issue:

The table over in our Postgres cluster is similar except for the data type “text” being used instead of “varchar”. All kind of boring so far, but what we noticed that on some very rare occasions the “largevalue” column was empty over in the PostgreSQL database even though for that row it was populated in SQL Server.

This seemed odd to me, like you would expect if there was some error inserting the row on the PostgreSQL side then since it is all done within a transaction that it would either all succeed or all fail, how would the row be partially inserted, i.e. missing this text value.

Read on for the story and several tests of how things work.

Comments closed

Query Hash Value Changes

Published 2024-11-25 by Kevin Feasel

Kendra Little digs into a problem:

This is the worst bug I’ve found in SQL Server to date. Previously, my top find was SQL Server Online Index Rebuild sometimes happens offline without warning. This one has taken top slot because it makes my life more difficult on a daily basis.

Background: SQL Server generates a query_hash for each query. This is stored in sys.query_store_query and it’s one of the primary ways you can identify what a query is across different Query Stores, or even the same Query Store over time, as surrogate query_id values get reset if Query Store is cleared or data ages on. The query_hash is a “Zobrist hash over the shape of the individual query, based on the bound (input) logical query tree. Query hints aren’t included as part of the hash.” (Source)

Except that’s wrong.

Read the whole thing.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Day: November 25, 2024

Generating Effect Plots in Python and R

Finding Free Space in Azure Database for PostgreSQL

New Syntax in SQL Server 2022

The Difficulty of Deletion in PostgreSQL

AWS DMS and a LOB Bug

Query Hash Value Changes