Curated SQL – Page 129 – A Fine Slice Of SQL Server

Data Recovery in SQL Server without a Backup

Published 2025-05-13 by Kevin Feasel

In more than 10 years of experience, I have dealt with cases where someone has performed incorrect operations on a table, such as updating or deleting wrong rows, in a SQL Server database that does not have full backups available. There are multiple reasons for no full backup: corrupted backups, taking too much time to restore, etc.

In this post, I want to show an alternative for these cases, an ace up one’s sleeve, that you can use to recover data. This method also provides a deep understanding of the internal workings of how your SQL Server stores data. So, in addition to learning how to recover data, you will gain more insights into the internals of SQL.

Read on to see how. Rodrigo also points out some limitations or things that would need to change if you have index compression. I consider this a very neat thing you might need to know but never want to use.

Comments closed

Comparing Microsoft Fabric Engines

Published 2025-05-13 by Kevin Feasel

Nikola Ilic performs a comparison:

Before we proceed, an important disclaimer: the guidance I’m providing here is based on both my experience with implementing Microsoft Fabric in real-world scenarios, and the recommended practices provided by Microsoft.

Please keep in mind that the guidance relies on general recommended practices (I intentionally avoid using the phrase best practices, because the best is very hard to determine and agree on). The word general means that the practice I recommend should be used in most of the respective scenarios, but there will always be edge cases when the recommended practice is simply not the best solution. Therefore, you should always evaluate whether the general recommended practice makes sense in your specific use case.

Click through for a comparison between three engines: the lakehouse, the warehouse, and the eventhouse. It would really simplify things if the lakehouse and warehouse combined into one coherent whole.

Comments closed

Hash Indexes in MySQL

Published 2025-05-13 by Kevin Feasel

Lukas Vileikis explains how hash indexes work in MySQL:

In MySQL, hash indexes are indexes that are used in queries that use the equality operators like = or <=> (which is the MySQL NULL safe equality operator, equivalent to the SQL Standard IS NOT DISTINCT FROM). Hash indexes are not used in other situations, so they can be useful for things like random or generated PRIMARY KEY values where you are looking up single rows and not needing to order rows or searching for a range of values.

Just like in SQL Server hash indexes only work for in-memory tables, and some of the mechanisms look very similar.

Comments closed

Using Multiple Scales with ggplot2 and ggnewscale

Published 2025-05-12 by Kevin Feasel

Zhenguo Zhang resets the scale:

In one ggplot figure, normally you can only use one scale for each aesthetic mapping. For example, if you use scale_color_manual() to set the color scale for a layer, you cannot use another scale_color_manual() for another layer, or set the color scale more then once in the function aes(). However, you can use the new_scale_color() function from the ggnewscale package to add a new scale for the same aesthetic mapping in different layers.

In this post, I will showcase how to use the new_scale_color() function to add two different color scales in a ggplot figure. The first scale will be for a discrete variable (e.g., number of cylinders), and the second scale will be for a continuous variable (e.g., density level).

Click through for the code and a demonstration.

Comments closed

Git Branching for Small Teams

Published 2025-05-12 by Kevin Feasel

Adron Hall takes us through a branching strategy:

Git. It’s the tool that makes some of us developers wonder why they didn’t become a carpenter. But let’s face it: Git is here to stay. And for a small team—like, say, 3-4 developers working on the same codebase—getting your branching strategy right can be the difference between smooth sailing and a storm of merge conflicts that will make you question every decision you’ve ever made in life.

So let’s dive into a “simple” strategy for keeping Git under control. No complex workflows, no corporate jargon—just a few solid, time-tested practices to keep you from drowning in source control hell. Because seriously, git is actually super easy and a thousand times better than all the garbage attempts at source control that came before.

Click through for Adron’s advice. Feature branches start making since once you have more than 2 or maybe 3 developers working in the same repo.

Comments closed

Partitioning in PostgreSQL

Published 2025-05-12 by Kevin Feasel

Umair Shahid takes us into partitioning strategies in PostgreSQL:

My recommended methodology for performance improvement of PostgreSQL starts with query optimization. The second step is architectural improvements, part of which is the partitioning of large tables.

Partitioning in PostgreSQL is one of those advanced features that can be a powerful performance booster. If your PostgreSQL tables are becoming very large and sluggish, partitioning might be the cure.

It’s interesting to compare this against SQL Server, where partitioning is not a strategy for query performance improvements.

Comments closed

Shortcut Caching in Microsoft Fabric now GA

Published 2025-05-12 by Kevin Feasel

Trevor Olson announces a feature has become generally available:

Shortcuts in OneLake allow you to quickly and easily source data from external cloud providers and use it across all Fabric workloads such as Power BI reports, SQL, Spark and Kusto. However, each time these workloads read data from cross-cloud sources, the source provider (AWS, GCP) charges additional egress fees on the data. Thankfully, shortcut caching allows the data to only be sourced once and then used across all Fabric workloads without additional egress fees.

This is useful for data that hardly ever changes, and Trevor also shows you who can control the cache length and reset the cache. In addition, the on-premises gateway for shortcuts is now generally available, so you can take shortcuts of certain on-prem file systems.

Comments closed

Set-Based Comparisons for Data Validation

Published 2025-05-12 by Kevin Feasel

Jeffry Schwartz looks for exceptions:

Given the complexity, I realized that validating all intermediate and final result sets was essential to ensure that tuning changes did not alter any report results. To support this validation, I saved interim and final result sets into tables for direct comparison.

For these comparisons, the EXCEPT and INTERSECT operators proved invaluable.

Click through for the full story. I’ve always liked using these set operations in ETL jobs because they automatically know how to handle NULL, so this approach is more robust than rigging your own comparisons.

Comments closed

sudo in Windows

Published 2025-05-12 by Kevin Feasel

Patrick Gruenauer elevates our access:

Sudo for Windows is a new way for users to execute commands with elevated privileges (as an administrator) directly from a non-relevant console session on Windows.

The following requirements apply to the use of sudo in Windows:

Windows 11 24H2

Sudo needs to be enabled

Click through to see how to activate sudo. The English-language header reads “System > For Developers” and the exact setting is at the bottom of the first section and has the name “Enable sudo” with a toggle switch. The number of times I’ve run a command just to see it error out because I needed to be in an administrative command prompt or PowerShell terminal is high enough that I immediately turned it on.

But importantly, this is different from Linux, in that it opens up a new command prompt or PowerShell terminal rather than executing the command with elevated permissions in the same prompt. This is important because that new prompt goes away after the command finishes, so you lose the output. In other words, if you run sudo ipconfig in a command prompt, it will hit you with a UAC request (depending on how you’ve configured your PC) and then run ipconfig in a new command prompt, which disappears as soon as the command finishes. You don’t get to keep what was in stdout. I think this limits some of the capability of the option, unfortunately.

Comments closed

Parameterization and Mocking in Python Tests

Published 2025-05-09 by Kevin Feasel

Aida Gjoka and Russ Hyde show off some capabilities in the pytest library:

Writing tests is one of the best ways to keep your code reliable and reproducible. This post builds on our previous blog about Python testing with pytest Part 1, and explores some of the more advanced features it offers. From parametrised fixtures to mocking and other useful pytest plugins, we will show how to make your tests more reproducible, easier to manage and demonstrate how writing simple tests can save you time in the long run.

Click through to learn more. I’m a huge fan of parameterization in pytest—it’s really easy to do. Mocks are a bit harder to pull off in practice, though quite useful.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Curated SQL Posts