Press "Enter" to skip to content

Curated SQL Posts

Shortcut Caching in Microsoft Fabric now GA

Trevor Olson announces a feature has become generally available:

Shortcuts in OneLake allow you to quickly and easily source data from external cloud providers and use it across all Fabric workloads such as Power BI reports, SQL, Spark and Kusto.  However, each time these workloads read data from cross-cloud sources, the source provider (AWS, GCP) charges additional egress fees on the data. Thankfully, shortcut caching allows the data to only be sourced once and then used across all Fabric workloads without additional egress fees.

This is useful for data that hardly ever changes, and Trevor also shows you who can control the cache length and reset the cache. In addition, the on-premises gateway for shortcuts is now generally available, so you can take shortcuts of certain on-prem file systems.

Comments closed

Set-Based Comparisons for Data Validation

Jeffry Schwartz looks for exceptions:

Given the complexity, I realized that validating all intermediate and final result sets was essential to ensure that tuning changes did not alter any report results.  To support this validation, I saved interim and final result sets into tables for direct comparison.

For these comparisons, the EXCEPT and INTERSECT operators proved invaluable. 

Click through for the full story. I’ve always liked using these set operations in ETL jobs because they automatically know how to handle NULL, so this approach is more robust than rigging your own comparisons.

Comments closed

sudo in Windows

Patrick Gruenauer elevates our access:

Sudo for Windows is a new way for users to execute commands with elevated privileges (as an administrator) directly from a non-relevant console session on Windows.

The following requirements apply to the use of sudo in Windows:

  1. Windows 11 24H2
  2. Sudo needs to be enabled

Click through to see how to activate sudo. The English-language header reads “System > For Developers” and the exact setting is at the bottom of the first section and has the name “Enable sudo” with a toggle switch. The number of times I’ve run a command just to see it error out because I needed to be in an administrative command prompt or PowerShell terminal is high enough that I immediately turned it on.

But importantly, this is different from Linux, in that it opens up a new command prompt or PowerShell terminal rather than executing the command with elevated permissions in the same prompt. This is important because that new prompt goes away after the command finishes, so you lose the output. In other words, if you run sudo ipconfig in a command prompt, it will hit you with a UAC request (depending on how you’ve configured your PC) and then run ipconfig in a new command prompt, which disappears as soon as the command finishes. You don’t get to keep what was in stdout. I think this limits some of the capability of the option, unfortunately.

Comments closed

Parameterization and Mocking in Python Tests

Aida Gjoka and Russ Hyde show off some capabilities in the pytest library:

Writing tests is one of the best ways to keep your code reliable and reproducible. This post builds on our previous blog about Python testing with pytest Part 1, and explores some of the more advanced features it offers. From parametrised fixtures to mocking and other useful pytest plugins, we will show how to make your tests more reproducible, easier to manage and demonstrate how writing simple tests can save you time in the long run.

Click through to learn more. I’m a huge fan of parameterization in pytest—it’s really easy to do. Mocks are a bit harder to pull off in practice, though quite useful.

Comments closed

Real-Time Data Streaming in Snowflake

Anil Kumar Moka streams some data:

Real-time data ingestion has become essential for modern analytics and operational intelligence. Organizations across industries need to process data streams from IoT sensors, financial transactions, and application events with minimal latency. Snowflake offers two robust approaches to meet these real-time data needs: Snowpipe for near-real-time file-based streaming and Direct Streaming via Snowpark API for true real-time data integration.

This guide explores both options in depth, providing detailed implementations with explanation of code parameters, performance comparisons, and practical recommendations to help you choose the right approach for your specific use case.

Click through to see how it works. I’ll only make one semi-snarky comment that ‘real-time’ doesn’t mean “takes several seconds” but I realize I’m the one tilting at windmills here.

Comments closed

Checking Key Vault Access in Microsoft Fabric Spark Notebooks

Marc Lelijveld has clearance:

Working with sensitive data in Microsoft Fabric requires careful handling of secrets, especially when collaborating externally. In a recent customer engagement, I needed to validate access to Azure Key Vault from within a Fabric Notebook, without ever exposing the actual secret values. With only read access granted and no need to manage or update secrets, I focused on confirming that the connection was working as expected.

In this blog, I’ll walk you through the approach, including the setup, code snippets, and logic behind this quick but crucial verification step.

Click through for the full story.

Comments closed

Checking SQL Server for tempdb Performance Issues

Jeff Iannucci announces a new stored procedure:

Most database folks might know the tempdb database in SQL Server is used for temporary tables, but many folks don’t realize all the other things that use this critical system database.

It’s also used by table variables, cursors, aggregations, joins, and sorts. And by memory spills when you don’t have enough memory. And by integrity checks that you run regularly to check for corruption. And more.

Optimal performance of tempdb is vital, as it is involved constantly with all your database queries. So…is your tempdb configured and running optimally? And if it isn’t, can you tell why?

Click through to see how it works and where you can get a free copy.

Comments closed

Mirroring vs Shortcuts in Microsoft Fabric

Nikola Ilic compares and contrasts:

A few days ago, I shared a short LinkedIn post in which I condensed key differences and use cases for both mirroring and shortcuts in Microsoft Fabric. Since the post sparked some nice conversation and opened a few additional question, I decided to cover the same topic here, so that it doesn’t get lost in LinkedIn’s “jungle”:)

But, before we move to the final showdown, let’s take one step back and explain what mirroring and shortcuts are.

Click through for the overview and explanation of how the two differ, as well as where each fits.

Comments closed

Restoring Multiple Differential Backup Files

Tim Radney violates Betteridge’s Law of Headlines:

I was recently asked if you can restore multiple differential backups in preparation for a migration. I responded that yes, technically you can restore multiple differential backups, however it will not speed up your cutover.

As soon as I read the first sentence, the answer in my head was “Yes, but why?” Tim explains the person’s reasoning and then demonstrates that this reasoning doesn’t quite work.

Comments closed