Press "Enter" to skip to content

Curated SQL Posts

Migrating DATETIME Data to DATETIMEOFFSET

William Assaf adds some time zones:

I recently reviewed, worked on, and added a similar example to the DATETIMEOFFSET Microsoft Learn Docs article at the recommendation of my colleague Randolph West, who guessed (accurately) I would enjoy such a task. It was a nice pre-Build diversion. 

This topic is one that I have co-presented on in the past and hounded project capstone review presentations about. If you’re not storing time zone offset in your date/time data, you’re setting yourself up for future pain. That future pain is not what this blog post is about.

My preference is not to store time zone offset but instead store everything in UTC and perform any time zone switcharoos in the UI. But if you are storing local dates and times, I completely agree that you should keep track of the time zone. I worked for an east coast US company that bought a west coast US company, and both stored local dates and times in their SQL Server databases, making data consolidation a real challenge.

Comments closed

Listen and Notify in Postgres

Brandur Leach shows how to use PostgreSQL’s listen/notify capabilities:

Listen/notify in Postgres is an incredible feature that makes itself useful in all kinds of situations. I’ve been using it a long time, started taking it for granted long ago, and was somewhat shocked recently looking into MySQL and SQLite to learn that even in 2024, no equivalent exists.

In a basic sense, listen/notify is such a simple concept that it needs little explanation. Clients subscribe on topics and other clients can send on topics, passing a message to each subscribed client. The idea takes only three seconds to demonstrate using nothing more than a psql shell:

Read on to learn more about the notifier pattern. What’s interesting is that the notifier patter, which adds a fair bit of structure to this very simple process, makes it work a good bit like SQL Server’s Service Broker.

Comments closed

Number of Fabric Workspaces and the Medallion Architecture

Kevin Chant opens a can of worms:

Since I got asked about it this week during the Learn Together session I did alongside Shabnam Watson (l/X). Plus, it is a highly debated topic in our community, and I wanted to share my thoughts about it.

Due to the fact that my personal opinion is that it depends. However, the number you choose depends on a variety of reasons which I intend to cover in this post.

By the end of this post, you will know my personal opinions as to why. Plus, plenty of things to consider when deciding on the number of workspaces to implement.

Read on for Kevin’s thoughts. My quick opinion is, one workspace per layer. Just from a logistical standpoint, keeping the several layers separated in one workspace is an immense challenge and typically requires exposing data engineering details (like what “gold”/”silver” or “curated”/”refined” actually means) with end users.

Comments closed

Max Offline Semantic Model Size Error

Chris Webb continues a series on Power BI semantic model memory errors:

In the Power BI Service, Import mode models are stored offline and paged into memory only when they are needed – for example because someone runs a report that uses the model as its source. As discussed in my last post, though, there’s a limit on the amount of memory that a model can use which varies by the size and type of capacity you’re using using. There’s also an extra complication in that Import mode models are stored offline in a compressed format that means the Power BI Service doesn’t know exactly how much memory will be needed if the entire model needs to be held in memory. As a result there is an additional limit enforced on the size of the offline copy of Import mode models to ensure they don’t use too much memory when they are eventually paged in.

Read on to learn more about this setting and what it looks like if you trigger the error.

Comments closed

SQL Server FAST N Query Hint

Chad Callihan hits the Turbo button:

How familiar are you with the OPTION (FAST N) query hint? It’s not one that I had ever used, so I decided to experiment with it a bit. Let’s look at what it does and how to apply it to a query.

My recollection is that this query hint highly prioritizes “streaming” operators and de-prioritizes operators that work on the whole dataset as a block, such as sorts. This means, for example, that you’ll get more nested loops joins and natural merge joins, but few sort + merge or hash match joins. It may also make some queries take considerably longer as a result.

Comments closed

Checking for Duplicate Rows with TidyDensity

Steven Sanderson looks for dupes:

Today, we’re diving into a useful new function from the TidyDensity R package: check_duplicate_rows(). This function is designed to efficiently identify duplicate rows within a data frame, providing a logical vector that flags each row as either a duplicate or unique. Let’s explore how this function works and see it in action with some illustrative examples.

Read on to see how it works. Though I am curious about whether there’s an option to ignore certain columns, such as row IDs or other “non-essential” columns you don’t want to include for comparison. Also, checking how it handles NA or NULL would be interesting.

Comments closed

Comparing SQL Server to Databricks

Paul Andrew makes a comparison:

Microsoft SQL Server and Azure Databricks over the many years I’ve been working in the data/IT industry have easily become my two favourite data processing tools. When Databricks became a first-class resource in Microsoft Azure, it was a big moment for the evolution of the data platform architectures I’ve designed and built (but architecture isn’t the focus for this blog). That said, rather than considering the tooling and technology as an evolution, I find a lot of people drawing comparisons between the products. This often leads to confusion and friction, as they are ultimately offering a lot of different capabilities, with only some common areas where comparisons could be made.

Read on for Paul’s thoughts. Spoilers: I agree with pretty much all of it.

Comments closed

Getting Row Counts of All Tables in a Microsoft Fabric Warehouse

Koen Verbeeck busts out the tally counter:

It says the data is 352MB in size, but after loading the data I was curious about how many rows were actually in that sample data set. Unfortunately, it’s not as straight forward as with a “normal” SQL Server database to get the row counts. First of all, when you connect with SSMS to the database there’s sadly no option to get the row counts report:

The post is a little depressing, really. But still worth the read.

Comments closed

Building an Elastic Job with Bicep

Josephine Bush flexes some muscles:

Bicep is an open-source Domain-Specific Language (DSL) that simplifies the process of deploying Azure resources. It is an abstraction layer on top of Azure Resource Manager (ARM) templates, making it easier to write and understand infrastructure code. Bicep lets you describe your Azure infrastructure using a cleaner and more concise syntax than traditional ARM templates.

It’s definitely easier to read and work with Bicep than directly with ARM template JSON. Larger Bicep scripts can still be pretty confusing, but it’s definitely easier to write and maintain.

Comments closed

Third-Party Applications and Poor Database Design

Kevin Hill talks about a bugbear of mine:

As a SQL DBA, what do you do when a vendor application has performance problems that are code related?

Server settings don’t generally seem to be an issue.

Queries and vendor code…total hands off. I just point at code and say “There’s a great choice for optimizing in your next update!”

Indexes are the “Sticky Bits” in between client data and vendor code.

To an extent, I do feel for software vendors, who have to write software that works in a variety of environments with a variety of workloads. That can be a real challenge, especially because the people developing those databases don’t get to monitor them in real time and observe what’s going on until someone reports an issue.

That’s the empathy part. The other side of the coin is, there are a bunch of vendors who have garbage-tier database designs and awful queries. And as a user of this third-party application or a consultant trying to help users of the application, that can be frustrating. The reason is, like Kevin mentions, you really don’t want to go mucking with the queries or database design, because with my luck, the change I make to improve a query’s performance will affect a trigger nested three levels deep in some totally unrelated process that somehow manages the data integrity of the entire application.

Comments closed