Press "Enter" to skip to content

Curated SQL Posts

Git Branching for Small Teams

Adron Hall takes us through a branching strategy:

Git. It’s the tool that makes some of us developers wonder why they didn’t become a carpenter. But let’s face it: Git is here to stay. And for a small team—like, say, 3-4 developers working on the same codebase—getting your branching strategy right can be the difference between smooth sailing and a storm of merge conflicts that will make you question every decision you’ve ever made in life.

So let’s dive into a “simple” strategy for keeping Git under control. No complex workflows, no corporate jargon—just a few solid, time-tested practices to keep you from drowning in source control hell. Because seriously, git is actually super easy and a thousand times better than all the garbage attempts at source control that came before.

Click through for Adron’s advice. Feature branches start making since once you have more than 2 or maybe 3 developers working in the same repo.

Comments closed

Partitioning in PostgreSQL

Umair Shahid takes us into partitioning strategies in PostgreSQL:

My recommended methodology for performance improvement of PostgreSQL starts with query optimization. The second step is architectural improvements, part of which is the partitioning of large tables.

Partitioning in PostgreSQL is one of those advanced features that can be a powerful performance booster. If your PostgreSQL tables are becoming very large and sluggish, partitioning might be the cure. 

It’s interesting to compare this against SQL Server, where partitioning is not a strategy for query performance improvements.

Comments closed

Shortcut Caching in Microsoft Fabric now GA

Trevor Olson announces a feature has become generally available:

Shortcuts in OneLake allow you to quickly and easily source data from external cloud providers and use it across all Fabric workloads such as Power BI reports, SQL, Spark and Kusto.  However, each time these workloads read data from cross-cloud sources, the source provider (AWS, GCP) charges additional egress fees on the data. Thankfully, shortcut caching allows the data to only be sourced once and then used across all Fabric workloads without additional egress fees.

This is useful for data that hardly ever changes, and Trevor also shows you who can control the cache length and reset the cache. In addition, the on-premises gateway for shortcuts is now generally available, so you can take shortcuts of certain on-prem file systems.

Comments closed

Set-Based Comparisons for Data Validation

Jeffry Schwartz looks for exceptions:

Given the complexity, I realized that validating all intermediate and final result sets was essential to ensure that tuning changes did not alter any report results.  To support this validation, I saved interim and final result sets into tables for direct comparison.

For these comparisons, the EXCEPT and INTERSECT operators proved invaluable. 

Click through for the full story. I’ve always liked using these set operations in ETL jobs because they automatically know how to handle NULL, so this approach is more robust than rigging your own comparisons.

Comments closed

sudo in Windows

Patrick Gruenauer elevates our access:

Sudo for Windows is a new way for users to execute commands with elevated privileges (as an administrator) directly from a non-relevant console session on Windows.

The following requirements apply to the use of sudo in Windows:

  1. Windows 11 24H2
  2. Sudo needs to be enabled

Click through to see how to activate sudo. The English-language header reads “System > For Developers” and the exact setting is at the bottom of the first section and has the name “Enable sudo” with a toggle switch. The number of times I’ve run a command just to see it error out because I needed to be in an administrative command prompt or PowerShell terminal is high enough that I immediately turned it on.

But importantly, this is different from Linux, in that it opens up a new command prompt or PowerShell terminal rather than executing the command with elevated permissions in the same prompt. This is important because that new prompt goes away after the command finishes, so you lose the output. In other words, if you run sudo ipconfig in a command prompt, it will hit you with a UAC request (depending on how you’ve configured your PC) and then run ipconfig in a new command prompt, which disappears as soon as the command finishes. You don’t get to keep what was in stdout. I think this limits some of the capability of the option, unfortunately.

Comments closed

Parameterization and Mocking in Python Tests

Aida Gjoka and Russ Hyde show off some capabilities in the pytest library:

Writing tests is one of the best ways to keep your code reliable and reproducible. This post builds on our previous blog about Python testing with pytest Part 1, and explores some of the more advanced features it offers. From parametrised fixtures to mocking and other useful pytest plugins, we will show how to make your tests more reproducible, easier to manage and demonstrate how writing simple tests can save you time in the long run.

Click through to learn more. I’m a huge fan of parameterization in pytest—it’s really easy to do. Mocks are a bit harder to pull off in practice, though quite useful.

Comments closed

Real-Time Data Streaming in Snowflake

Anil Kumar Moka streams some data:

Real-time data ingestion has become essential for modern analytics and operational intelligence. Organizations across industries need to process data streams from IoT sensors, financial transactions, and application events with minimal latency. Snowflake offers two robust approaches to meet these real-time data needs: Snowpipe for near-real-time file-based streaming and Direct Streaming via Snowpark API for true real-time data integration.

This guide explores both options in depth, providing detailed implementations with explanation of code parameters, performance comparisons, and practical recommendations to help you choose the right approach for your specific use case.

Click through to see how it works. I’ll only make one semi-snarky comment that ‘real-time’ doesn’t mean “takes several seconds” but I realize I’m the one tilting at windmills here.

Comments closed

Checking Key Vault Access in Microsoft Fabric Spark Notebooks

Marc Lelijveld has clearance:

Working with sensitive data in Microsoft Fabric requires careful handling of secrets, especially when collaborating externally. In a recent customer engagement, I needed to validate access to Azure Key Vault from within a Fabric Notebook, without ever exposing the actual secret values. With only read access granted and no need to manage or update secrets, I focused on confirming that the connection was working as expected.

In this blog, I’ll walk you through the approach, including the setup, code snippets, and logic behind this quick but crucial verification step.

Click through for the full story.

Comments closed

Checking SQL Server for tempdb Performance Issues

Jeff Iannucci announces a new stored procedure:

Most database folks might know the tempdb database in SQL Server is used for temporary tables, but many folks don’t realize all the other things that use this critical system database.

It’s also used by table variables, cursors, aggregations, joins, and sorts. And by memory spills when you don’t have enough memory. And by integrity checks that you run regularly to check for corruption. And more.

Optimal performance of tempdb is vital, as it is involved constantly with all your database queries. So…is your tempdb configured and running optimally? And if it isn’t, can you tell why?

Click through to see how it works and where you can get a free copy.

Comments closed