Press "Enter" to skip to content

Curated SQL Posts

Thoughts on Primary and Foreign Key Constraints

Rob Farley lays out an argument:

I am NOT suggesting that data integrity is irrelevant. Not at all. But how often do we need an enforced primary key or foreign key?

Be warned – I’m not actually going to come to a conclusion in this post. I’m going to make cases for both sides, and let you choose where you stand. The T-SQL Tuesday topic this month is Integrity, so some other people might have written on a similar topic, and produce even more compelling arguments one way or another. I’m the host this time, so you’ll be able to find the round-up (once it’s there) here on the LobsterPot Solutions site too.

I will come to a conclusion and it is that OLTP systems need primary and foreign key constraints to work properly. In the post, Rob asks a question around the last time I saw a key violation error in my application. The good(?) news is that I have plenty of them in the last application I built on SQL Server, because I need to rely on a source system that dumps data and doesn’t actually check to see if existing records were there already. That means I can’t simply perform an inner join from my table to the source table, because I could get multiple records back. No, instead, I need to use a common table expression or APPLY operator, retrieve the max values from the flotsam and jetsam that exists, and make my code harder to follow and perform worse as a result.

Distributed warehousing systems don’t have enforceable keys because of the technical challenge of enforcing keys without having different nodes talk to each other. But these things also assume either that you’ve pre-validated all of the data (like in a Kimball model), that you don’t care about duplicate records or messiness, or that you’ll fix the problem again somewhere downstream. Which, in the case of Microsoft Fabric, is typically necessary by the time you put the data into a semantic model, as those things really don’t like duplicate records and this tends to mess up relationships between tables.

Comments closed

Changing the Source Lakehouse in a Power BI Deployment Pipeline

Chris Webb makes a switch:

If you’re using deployment pipelines with Direct Lake semantic models in Power BI you’ll have found that when you deploy your model from one stage to another by default the model still points to the Lakehouse it was originally bound to. So, for example, if you deploy your model from your Development stage to your test stage, the model in the Test stage still points to the Lakehouse in the Development stage. The good news is that you can use the deployment rules feature of deployment pipelines to make sure the model in the Test stage points to a Lakehouse in the Test stage and in this post I’ll show you how.

Click through for the process.

Comments closed

Thoughts on Dogma and Power BI

Eugene Meidinger wants you to leave Plato’s cave:

I continue to be really frustrated about the dogmatic approach to Power BI. Best practices become religion, not to be questioned or elaborated on. Only to be followed. And you start to end up with these 10 Power BI modeling commandments:

Click through for those ten, followed by a valuable rant. This one’s a challenge for me because I understand where Eugene is going and agree. But there exists a fairly large subset of the population for whom Power BI (or whatever) isn’t a core part of the job and these people simply want an answer in order to complete a task and move on, not to gain a deeper understanding of the product. And it can be a challenge to differentiate these people from the people who would benefit from the more detailed explanation.

In short, you can drag a man out of Plato’s cave, but you can’t make him think.

Comments closed

Parquet File Customization and SQL Server

Ed Pollack writes some files:

Previously, we introduced and discussed the Parquet file format and SQL Server and why it is an ideal format for storing analytic data when it does not already reside in a native analytic data store, such as a data lake, data warehouse, or an Azure managed service.

Both Python and the Parquet file format are quite flexible, allowing for significant customization to ensure that file-related tasks are as optimal as possible. Compatibility with other processes, as well as keeping file sizes and properties under control will also be introduced here.

Click through for some examples.

Comments closed

Table-Level Locks in PostgreSQL

Gulcin Yildirim Jelinek takes us through locking in PostgreSQL:

All locking, whatever their type is, will reduce the throughput, and potentially increase the latency, which means a loss of performance, as nothing is ever free. If my intention is to make sure my data does not have corruption and everyone is getting a correct result at their time of query, I have to agree that I’d have to lock access when multiple transactions are targeting the same table or same row to make sure we take some time to keep the order of things instead of showing wrong results, fast.

Read on for a quick primer on multi-version concurrency control, locking, and lock mechanisms in PostgreSQL.

Comments closed

Exploring SQL Databases in Microsoft Fabric

Jared Westover looks at the bright side of life:

Over the past few months, I’ve toyed with Microsoft Fabric, focusing on the Data Factory and Power BI experiences. Everything I’ve developed so far is in the proof-of-concept (POC) phase. Naturally, I’m skeptical about new game-changing features, and Fabric is no exception. Any new flashy tech brings bugs along in the early stages. We’ve all been there, working for weeks on a project to have random bugs throw a wrench in everything.

When Microsoft announced SQL databases in Fabric, I was intrigued. After watching the Ignite session, Power AI apps with insights from SQL database in Fabric, a few features instantly stood out, and I want to share my first impressions.

Read on to learn more.

Comments closed

Set Operations in T-SQL

Erik Darling has a pair of videos covering 3 1/2 set operations. First up is UNION and UNION ALL. These are the set operations that most people know about and use fairly regularly, and Erik explains the difference between the two, including the performance difference between the two.

Then, Erik hits upon the two lesser-known set operations: INTERSECT and EXCEPT. These are extremely useful in certain circumstances, and tend to perform much better than other alternatives. For example, to figure out if two datasets are exactly the same, it’s really hard to go wrong with the following pair of queries:

SELECT a.* FROM a EXCEPT SELECT b.* FROM b;
SELECT b.* FROM b EXCEPT SELECT a.* FROM a;

You need both queries because the former tells you if there are any records in A that do not exist in B, either because the record simply is not there or because there is a difference in one or more values in B’s matching record. But then, you also have to check the opposite side, because there might be a record in B that does not exist in A and the first query will not expose it.

This is typically the way I’d write test cases, ensuring that both queries return 0 results. Granted, you could always just check that the count of the intersection equals the count of records:

DECLARE @c1 INT, @c2 INT;
SELECT @c1 = SELECT COUNT(*) FROM (SELECT a.* FROM a INTERSECT SELECT b.* FROM b);
SELECT @c2 = SELECT COUNT(*) FROM a;

In this case, @c1 and @c2 should be the same number.

1 Comment

The Importance of Virtual Environments in Python

Jack Wallen proselytizes for virtual environments:

When developing with Python, chances are pretty good that you’ll need to install various libraries, dependencies and apps to get your project started. The good news is that (in most cases) those installations are pretty straightforward (thanks to pip and other tools).

Problems can arise, however, if you simply install all of those project requirements on your system. It’s like installing any given application, hoping it won’t cause problems with other applications, your OS or your data. In most cases, it’s safe, but there’s always that one instance where things can quickly go awry.

Read on to see how virtual environments can alleviate many of these pains. It took a while for me to understand exactly why virtual environments are so important, but this is definitely something I recommend doing if you work with Python in any capacity.

Comments closed

Always On Lease Timeout Monitoring

Yvonne Vanslageren gives us one more thing to check:

SQL Server Always On Availability Groups are a robust solution for achieving high availability and disaster recovery for SQL Server databases. However, simply configuring them is not enough—you also need a solid monitoring strategy to ensure data integrity and system reliability. One key aspect of this monitoring process is keeping an eye on lease timeouts, which can signal larger issues and help prevent potentially catastrophic split-brain scenarios.

In this post, we’ll walk through the various health checks available for Always On Availability Groups, discuss how lease timeouts work, and explore practical methods for monitoring and troubleshooting.

Read on to learn more about the lease timeout concept, as well as where you can get this information and further recommendations around how to deal with the information.

Comments closed

Azure VM Constrained CPU and SQL Server Licensing

Rod Edwards works to save some money:

Our SQL servers have specific loads, patterns and resource requirements. We’ve monitored them, we know what they are inside and out. You know that x number of CPUs, and y GB or RAM means that your DBs happily sweat the hardware, but are sized to perfection to make sure it has just enough resource to complete what it needs to do. The negotiations with your Infra team in command of the VM resources were tough, but you’re not greedy… you appreciate that other VMs may require resource as well…

Now you have to migrate to Azure, and SQL on Azure VM is your only option. Great, loads of different types/flavours of VMs out there, one of them must be in the the goldilocks perfect porridge zone right? Right?

Licensing around this gets a little weird. Basically, Rod is absolutely right that an E64-16, for example, would require 16 cores of SQL Server, and that can save a good bit of scratch. For the OS, however, you’re paying for 64 cores, so keep that in mind while you’re rolling in a bed of the cash you’ve saved on SQL Server licensing.

Comments closed