2025-03-04 – Curated SQL

Bass Product Diffusion and Data Science

Published 2025-03-04 by Kevin Feasel

This is a graph of the percentage of Stack Overflow questions tagged with data science terms such as R, Pandas, and so on. It seems to show exploding interest in R and Pandas, and maybe even Tensorflow. Pandas was likely chosen as a proxy for interest in Python for data science (versus a general interest in Python). I’d prefer view counts over question percentages as a proxy of interest, but it is what it is.

Then I thought, let’s see if they have newer data. They do, and it is horrifying (though not unexpected to those of us in the industry).

Click through for the analysis, as well as an important note in the comments.

Comments closed

Non-Deterministic Functions and Data Factory Logging

Published 2025-03-04 by Kevin Feasel

Richard Swinbank runs into a problem:

TL;DR:

Data Factory implementations in Fabric, Azure Synapse Analytics or Azure Data Factory evaluate pipeline expressions separately for logging and execution.

Log information reported from activities using non-deterministic functions may be unreliable.

Richard does give us a nice tl;dr, but still read the whole thing.

Comments closed

Self-Hosted Integration Runtime Reconnecting to Cloud Service

Published 2025-03-04 by Kevin Feasel

Nivritti Suste handles an error:

In our organization, most data is stored on-premises with a limited set of less critical data is in the cloud. We use Azure to benefit from the cloud environment and Azure Data Factory (ADF) to move data.

With ADF, there are many components that need to integrate within the environment. The data on our on-premises servers needs to be shifted to the cloud periodically and we use Self-hosted Integration Runtime.

Our developers complain an ADF pipeline is failing with error: ‘The Self-hosted Integration Runtime is offline…’ What does this mean?

Click through for the answer.

Comments closed

Error Handling in SQL Server Stored Procedures

Published 2025-03-04 by Kevin Feasel

Erik Darling makes a mistake.

Haha, just kidding. Erik’s code never has mistakes, but he does have to deal with other people who have foolishly erred. This video is a good one. It covers a broad base of error handling in SQL Server, including improper parameter inputs, try-catch blocks, automatic retries, handling lock timeouts, and a lot more.

Comments closed

A Mistake of “Normalization”

Published 2025-03-04 by Kevin Feasel

Hans-Jürgen Schönig makes an argument:

The concept of “normalization” is often the first thing people who are new to databases are going to learn. We are talking about one of the fundamental principles in the realm of databases. But what is the use of normalization in the first place? Well, we want to avoid redundancies in the data and make sure that information is stored in a way that helps reduce mistakes and inconsistencies. Ultimately, that is all there is to it: No redundancies, no mistakes, no inconsistencies.

There’s an example in this of “too much normalization” but I’m going to push back because this is a common misunderstanding of the idea of normalization.

The example covers removing price from an invoice table and having people look up the price from the product table, as having each price in an invoice is duplication, and we’re trying to eliminate duplication.

This argument is wrong, because it conflates two concepts. The listing price of an item is its current price. This is the thing you will see on a products table. The sale price of an item on the invoice table is a historical artifact and is not the same as the listing price, even if the dollar amounts match. Hans-Jürgen points out the consequence of making this mistake, and is correct in pointing this out. But it’s not “too much normalization” because it misunderstands the domain model and eliminating sale price from a table would remove information. Properly following the rules of normalization means you cannot lose information–that’s what each one of the normal forms does. In this case, we remove an attribute based on a faulty assumption that there is a functional dependency between product ID and sale price (that is, every time you see a specific product ID, you will always see a specific sale price). That’s the crux of the issue in this example, but the concept of normalization takes strays as a result of the faulty assumed functional dependency.

Comments closed

Dealing with Optional Carriage Returns in SSIS

Published 2025-03-04 by Kevin Feasel

Andy Brownsword has fun with file formats:

When ingesting files in SSIS via Flat File Connections, a consistent format is key. Sometimes that isn’t the case. Here we’ll look at an example where the carriage return (CR, \r) may or may not be included in the file.

Pepperidge Farms remembers back in the day when Windows, MacOS, and Linux (or any flavor of UNIX for that matter) each had a different way of ending a line: line feed, carriage return, or both. And of course most tools weren’t smart enough to figure out which your particular text file followed and display it correctly.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Day: March 4, 2025

Bass Product Diffusion and Data Science

Non-Deterministic Functions and Data Factory Logging

Self-Hosted Integration Runtime Reconnecting to Cloud Service

Error Handling in SQL Server Stored Procedures

A Mistake of “Normalization”

Dealing with Optional Carriage Returns in SSIS