Curated SQL – Page 271 – A Fine Slice Of SQL Server

Thoughts on Natural Keys

Published 2024-06-10 by Kevin Feasel

Although I live in Copenhagen and mostly walk or ride my bicycle in order to get around town, I do own an old car for getting around the rest of the country. In Denmark, cars go through mandatory official inspection every other year, and I’ve been through a few of these in my life. A few years ago, the mechanic doing the inspection informed me that my car’s chassis number was incorrect.

This did make me a bit nervous, because I’d bought the car used, and I was suddenly concerned that things weren’t really as I thought. Had I unwittingly bought a stolen car?

But the mechanic just walked over to his computer in order to correct the error. That’s when a different kind of unease hit me. When you’ve programmed for some decades, you learn to foresee various typical failure modes. Since a chassis number is an obvious candidate for a natural key, I already predicted that changing the number would prove to be either impossible, or have all sorts of cascading effects, ultimately terminating in official records no longer recognizing that the car is mine.

Mark uses this as a jumping-off point on a discussion about whether to use natural keys as primary keys or whether to include surrogate keys instead. I am generally in favor of using surrogate keys in the physical data model and creating unique indexes for natural keys. But you have to use natural keys in the logical data model because surrogate keys don’t exist at the level of the logical data model. Do read the comments, though, because there’s a great debate in there.

Comments closed

Tweedie Distributions and Generalized Linear Modeling

Published 2024-06-07 by Kevin Feasel

Christian Lorentzen talks about Tweedie distributions:

Tweedie distributions and Generalised Linear Models (GLM) have an intertwined relationship. While GLMs are, in my view, one of the best reference models for estimating expectations, Tweedie distributions lie at the heart of expectation estimation. In fact, basically all applied GLMs in practice use Tweedie distributions with three notable exceptions: the binomial, the multinomial and the negative binomial distribution.

Read on for a bit more about its history and how it ties in with several other distributions.

Comments closed

Emulating the FILTER Clause in Oracle

Published 2024-06-07 by Kevin Feasel

Lukas Eder notes a bit of ANSI SQL:

The following aggregate function computes the number of rows per group which satifsy the FILTER clause:

SELECT COUNT(*) FILTER (WHEREBOOK.TITLE LIKE'A%'), COUNT(*) FILTER (WHEREBOOK.TITLE LIKE'B%'), ...FROMBOOK

This is useful for pivot style queries, where multiple aggregate values are computed in one go. For most basic types of aggregate function, it can be emulated simply by using CASE expressions, because standard aggregate functions ignore NULL values when aggregating.

Lukas shows how you can also implement this logic using JSON_TRANSFORM() though I think I’d just as soon stick with COUNT(CASE WHEN BOOK.TITLE LIKE 'A%' THEN 1 END) and try hard not to think about shredding JSON.

Comments closed

Exporting Extended Event Results to CSVs and Tables

Published 2024-06-07 by Kevin Feasel

David Fowler comes in with a quick tip:

This is something that I’ve seen pop up a couple of times on various forums recently, how can we export the results of an XEvent session. So I thought I’d answer the question here.

It’s actually a very simple thing to do in SSMS, although the option is not always easy to spot.

Click through to see how you can do it in T-SQL, as well as within SSMS.

Comments closed

Leaving the SQL Server Public Role Be

Published 2024-06-07 by Kevin Feasel

David Seis has some advice:

Roles play a crucial part in managing permissions and access control. One such role that often gets overlooked is the public role. This role, by default, has minimal permissions and every database user is a member. Sometimes, however, there are permissions granted to the public role which puts your data at risk.

David’s focus is not adding new permissions to the public role. I also recommend not taking permissions away. Some security tools will gripe that the public role has CONNECT access. I was foolish enough to follow those tools’ recommendations once and removed CONNECT from public. Then I got a bunch of calls about people not being able to log in… I suppose you can spin a yarn about how there are logins that a person should not be able to use for connections, but how often is that really the case?

Comments closed

Always Encrypted and Secure Enclaves

Published 2024-06-07 by Kevin Feasel

Pieter Vanhove explains what secure enclaves are and why they’re useful:

Encryption is a vital technique for protecting sensitive data from unauthorized access or modification. SQL Server and Azure SQL Database offer two encryption technologies that allow you to encrypt data in use: Always Encrypted and Always Encrypted with secure enclaves. In this blog post, we will compare these two technologies and highlight their benefits and limitations.

Read on to learn more.

Comments closed

Using Query Store to Fix a Cardinality Estimation Problem

Published 2024-06-07 by Kevin Feasel

Michael Bourgon solves an issue:

This morning I had a performance issue on a piece of code that worked long ago on a different server, and they were trying to put it in place today. It was SLOW. Like, 10 minutes slow. With the added bonus that it’s done through a web app, so it never finishes, it just always times out. After dealing with various approaches, I finally tried using the old Cardinality Estimator, and it went from 10 minutes to 3 seconds. But the query is inside the application, it doesn’t call a stored procedure. Which means the devs changing it is Non-Trivial. So I went to an updated version of an old trick – query store hints (which used to be Plan Guides)

Click through for a list of actions Michael took.

Comments closed

Book Review of Bernoulli’s Fallacy

Published 2024-06-06 by Kevin Feasel

John Mount reviews a book:

First the conclusion: this is a well researched and important book. My rating is a strong buy, and Bernoulli’s Fallacy is already influencing how I approach my work.

My initial “judge the book by its back cover” impression of Bernoulli’s Fallacy was negative. The back cover writes some very large checks that I was initially (and wrongly) doubtful that “its fists could cash.” The thesis is that frequentist statistics (the dominant statistical practice) is far worse than is publicly admitted, and that Bayesian methods are the fix. However, other reviews and the snippets by people I respect (such as Andrew Gelman and Persi Diaconis) convinced me to buy and read the book. And I am glad that I read it. The back cover was, in my revised opinion, fully justified.

Read on for John’s full review of a book that is quite critical of frequentist statistics in favor of Bayesian statistics—so that already makes the book a winner for me.

Comments closed

Distribution Parameter Wrangling in TidyDensity

Published 2024-06-06 by Kevin Feasel

Steven Sanderson introduces a new set of functions:

Greetings, fellow data enthusiasts! Today, we’re thrilled to unveil a fresh wave of functionalities in the ever-evolving TidyDensity package. Buckle up, as we delve into the realm of distribution statistics!

This update brings a bounty of new functions that streamline the process of extracting key parameters from various probability distributions. These functions adhere to the familiar naming convention util_distribution_name_stats_tbl(), making them easily discoverable within your R workflow.

Read on for the list and an example of how to use them.

Comments closed

Cleaning Up Large System Databases

Published 2024-06-06 by Kevin Feasel

Josephine Bush doesn’t need enormous system databases:

Always set this on your SQL Servers so you don’t have this problem in the first place. This is in the SQL Server Agent settings. I remember having some agent jobs that used to serve this function that ran on a schedule, which may have been required in older versions of SQL Server.

Josephine focuses on SQL Agent history and database backup history, both of which are good ones. If you have an older version of SQL Server or are using the package deployment model, there may be an explosion of information in msdb regarding SSIS that you’d want to manage. Also, check if any of the databases are in Full recovery mode; if so, ensure that the backup script you’re using for transaction log backups actually backs up system databases.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Curated SQL Posts