Press "Enter" to skip to content

Month: March 2025

Bass Product Diffusion and Data Science

John Mount does a fun analysis:

This is a graph of the percentage of Stack Overflow questions tagged with data science terms such as R, Pandas, and so on. It seems to show exploding interest in R and Pandas, and maybe even Tensorflow. Pandas was likely chosen as a proxy for interest in Python for data science (versus a general interest in Python). I’d prefer view counts over question percentages as a proxy of interest, but it is what it is.

Then I thought, let’s see if they have newer data. They do, and it is horrifying (though not unexpected to those of us in the industry).

Click through for the analysis, as well as an important note in the comments.

Leave a Comment

Self-Hosted Integration Runtime Reconnecting to Cloud Service

Nivritti Suste handles an error:

In our organization, most data is stored on-premises with a limited set of less critical data is in the cloud. We use Azure to benefit from the cloud environment and Azure Data Factory (ADF) to move data.

With ADF, there are many components that need to integrate within the environment. The data on our on-premises servers needs to be shifted to the cloud periodically and we use Self-hosted Integration Runtime.

Our developers complain an ADF pipeline is failing with error: ‘The Self-hosted Integration Runtime is offline…’ What does this mean?

Click through for the answer.

Leave a Comment

A Mistake of “Normalization”

Hans-Jürgen Schönig makes an argument:

The concept of “normalization” is often the first thing people who are new to databases are going to learn. We are talking about one of the fundamental principles in the realm of databases. But what is the use of normalization in the first place? Well, we want to avoid redundancies in the data and make sure that information is stored in a way that helps reduce mistakes and inconsistencies. Ultimately, that is all there is to it: No redundancies, no mistakes, no inconsistencies.

There’s an example in this of “too much normalization” but I’m going to push back because this is a common misunderstanding of the idea of normalization.

The example covers removing price from an invoice table and having people look up the price from the product table, as having each price in an invoice is duplication, and we’re trying to eliminate duplication.

This argument is wrong, because it conflates two concepts. The listing price of an item is its current price. This is the thing you will see on a products table. The sale price of an item on the invoice table is a historical artifact and is not the same as the listing price, even if the dollar amounts match. Hans-Jürgen points out the consequence of making this mistake, and is correct in pointing this out. But it’s not “too much normalization” because it misunderstands the domain model and eliminating sale price from a table would remove information. Properly following the rules of normalization means you cannot lose information–that’s what each one of the normal forms does. In this case, we remove an attribute based on a faulty assumption that there is a functional dependency between product ID and sale price (that is, every time you see a specific product ID, you will always see a specific sale price). That’s the crux of the issue in this example, but the concept of normalization takes strays as a result of the faulty assumed functional dependency.

Leave a Comment

Dealing with Optional Carriage Returns in SSIS

Andy Brownsword has fun with file formats:

When ingesting files in SSIS via Flat File Connections, a consistent format is key. Sometimes that isn’t the case. Here we’ll look at an example where the carriage return (CR\r) may or may not be included in the file.

Pepperidge Farms remembers back in the day when Windows, MacOS, and Linux (or any flavor of UNIX for that matter) each had a different way of ending a line: line feed, carriage return, or both. And of course most tools weren’t smart enough to figure out which your particular text file followed and display it correctly.

Leave a Comment

Keeping Documentation Narrow

Lukas Eder advises against expansive feature cross-pollination in documentation:

Every product manager knows this situation:

  • A user works with feature X1.
  • They find a limitation / bug / quirk and want to work around it.
  • The perfect workaround or alternative is feature X2, but without knowing that X2 exists, the user doesn’t find it and spends a lot of time looking for it.
  • The user requests X2 be documented on X1, because that would have saved them a ton of time.

This is such a common pattern, and while it’s perfectly understandable for such a user to request this, it is so terribly wrong to give in to this user’s request. Why is it wrong?

Read on for the answer.

Leave a Comment

Default Tenant Settings Changes in Microsoft Fabric

Nicky van Vroenhoven notices a change:

In case you have access to the M365 Admin Center, or more specific the M365 Message Center, you might have seen this message. I reckon not many people did.. That’s why I’m blogging about it here

I’m specifically talking about this message in the Message Center, being a major update and with admin impact

Communications on default checkbox changes on tenant settings and billing start for SQL database in Fabric.

Read on for more information about what’s changing.

Leave a Comment

Common Reasons for Emergency SQL Calls

Kevin Hill compiles a list:

If you are a production DBA (or Accidental prod DBA) you’ve gotten that frantic call in the middle of the night. Or maybe during little Suzy’s soccer game? Something broke, alerts are firing, nobody can order your widgets and the help desk tech has no idea where to start.

I’ve been on call since 2006. These are the most common things I’ve been called for at the worst times:

Read on for Kevin’s list. It’s a good one. I also get my share of “replication broke” but fortunately for most people, replication is pretty uncommon in environments.

Leave a Comment