Press "Enter" to skip to content

Curated SQL Posts

Tracking Column Sizes on DAX Queries

Chris Webb busts out the calculator:

I had meant to follow up my recent post on how to find the columns touched by a DAX query by writing one on how to use this technique to find the size of these columns in memory, so you can find the total size of the columns that need to be paged into memory when a DAX query runs on a Direct Lake semantic model. Before I could do that, though, my colleague Michael Kovalsky messaged me to say that not only had he taken the query from that first post and incorporated it in Semantic Link Labs, he’d done the work to get column sizes too. All that’s left for me to do, then, is give you some simple examples of how to use it.

Click through for those simple examples, though note that this requires Microsoft Fabric.

Leave a Comment

Concurrency Control in Oracle vs PostgreSQL

Umair Shahid continues a series on migrating from Oracle to PostgreSQL:

Transitioning from Oracle to PostgreSQL can be a transformative experience for database administrators because of the subtle differences between the two technologies. Understanding how the two handle concurrency differently is critical to managing highly concurrent workloads. 

Concurrency control is essential for maintaining data consistency when multiple users access the database simultaneously. Oracle and PostgreSQL take different approaches to concurrency control: Oracle primarily relies on locking and consistent snapshots, while PostgreSQL utilizes a Multi-Version Concurrency Control (MVCC) system.

This article provides an in-depth look at concurrency control in PostgreSQL from an Oracle perspective.

Read on for that comparison.

Leave a Comment

Missing Data in Microsoft Fabric Real-Time Intelligence Workloads

Greg Low covers a common scenario:

But another area that I see very few handling well, is the data that is missing, rather than just the data that is present. There’s a huge difference between data that arrived, and is odd, and data that just didn’t arrive at all.

One tool that’s great at working with streams of data is the Real Time Intelligence workload for Microsoft Fabric. And it’s also great at working with data that is missing from those streams.

Greg covers some of the scenarios around missing data, though not a lot on the process to fix them.

Leave a Comment

Using the subset() Function in R

Steven Sanderson plays duck-duck-goose with the data:

Data manipulation is a cornerstone of R programming, and selecting specific columns from data frames is one of the most common tasks analysts face. While modern tidyverse packages offer elegant solutions, Base R’s subset() function remains a powerful and efficient tool that every R programmer should master.

This comprehensive guide will walk you through everything you need to know about using subset() to manage columns in your data frames, from basic operations to advanced techniques.

Click through for a description of the function and examples of it in action.

Leave a Comment

Tenant Switching in Microsoft Fabric

Koen Verbeeck has good news:

Praise whatever deity you believe in, because it’s finally here, a tenant switcher for Microsoft Fabric (which includes Power BI). A what? Let me explain. When you have a organization with multiple tenants in Azure (also called directories in some products like Azure Devops), or you’re a consultant like me who works with multiple clients (with each their own tenants), it’s possible that you can log into multiple tenants using the same email address. This can happen if your user account was added as an external user to another tenant.

This has been a real pain, and unfortunately, that pain still exists for Power BI Desktop.

Leave a Comment

Hyperthreading and Warehouses

Joey D’Antoni covers the impact of hyperthreading on warehouse performance:

Database performance, especially in the cloud, is always a big concern. Traditionally, cloud storage has not performed as well as high-end on-premises storage. The introduction of faster cloud storage (in Azure Premium V2 and Ultra Disk, in AWS IO1-3) and VMs with more available storage bandwidth has improved overall storage performance. However, storage is only sometimes the complete picture.

Read on for Joey’s tests and findings.

Leave a Comment

The IS Operator in T-SQL

Eric Blinn wants to know of those NULLs:

Many data professionals search for help writing T-SQL queries containing columns with NULL values in a Microsoft SQL Server table. Some of the most common NULL functions include IS NULL, IS NOT NULL, NOT NULL, NULL, what is NULL, NULL vs NOT NULL, etc. In this SQL tutorial, the goal is to help you better understand how to work with NULL values in a SQL database.

Even the best of us have accidentally used = NULL in a script when we meant IS NULL. Which is totally different from ISNULL(), of course. And NULLIF(), naturally.

Leave a Comment

Learn from Developer Edition rather than Express Edition

Vlad Drumea learns the right way:

It’s free for non-production use

Just like SQL Server Express Edition, Developer Edition is free, but, unlike Express, it’s only free for non-production purposes (developing, testing, demoing, learning).

If you want to learn more about this, Bob Ward has a blog post that covers this in-depth and answers some questions that people tend to have about Developer Edition.

Back in the day, Developer Edition used to cost money—something like $50, so not much, but just enough to make people prefer Express Edition. Those days are long gone, however.

Leave a Comment

Cross-Workspace KQL Queries in Microsoft Fabric

Sandeep Pawar drinks your milkshake:

In Fabric, if you want to query a delta table from a lakehouse in another workspace, you create a shortcut to that table. Similarly, in Eventhouse, you can also create shortcuts to Eventhouses in other workspaces, but the option might not be immediately obvious in the GUI. If you click on New > OneLake shortcut, it creates a shortcut to a delta table, not an Eventhouse.

Click through to see how to do this via UI and programmatically.

Leave a Comment

SCD Types in Microsoft Fabric

Kenneth Omorodion reminds us that the Kimball model is still quite valuable:

In modern data warehousing, how we handle updates to dimension tables is crucial. There are several approaches; but the decision often comes down to two primary strategies: Slowly Changing Dimensions (SCD) Type 2 and overwriting tables. Each has its own benefits, use cases, and trade-offs. This tip will explore the two methods and why SCD Type 2 is often a better option in many data warehouse scenarios.

Read on for this overview of the benefits of type-2 slowly changing dimensions, as well as a little bit of coverage of several other types of slowly changing dimensions.

Leave a Comment