Press "Enter" to skip to content

Day: May 12, 2025

Using Multiple Scales with ggplot2 and ggnewscale

Zhenguo Zhang resets the scale:

In one ggplot figure, normally you can only use one scale for each aesthetic mapping. For example, if you use scale_color_manual() to set the color scale for a layer, you cannot use another scale_color_manual() for another layer, or set the color scale more then once in the function aes(). However, you can use the new_scale_color() function from the ggnewscale package to add a new scale for the same aesthetic mapping in different layers.

In this post, I will showcase how to use the new_scale_color() function to add two different color scales in a ggplot figure. The first scale will be for a discrete variable (e.g., number of cylinders), and the second scale will be for a continuous variable (e.g., density level).

Click through for the code and a demonstration.

Leave a Comment

Git Branching for Small Teams

Adron Hall takes us through a branching strategy:

Git. It’s the tool that makes some of us developers wonder why they didn’t become a carpenter. But let’s face it: Git is here to stay. And for a small team—like, say, 3-4 developers working on the same codebase—getting your branching strategy right can be the difference between smooth sailing and a storm of merge conflicts that will make you question every decision you’ve ever made in life.

So let’s dive into a “simple” strategy for keeping Git under control. No complex workflows, no corporate jargon—just a few solid, time-tested practices to keep you from drowning in source control hell. Because seriously, git is actually super easy and a thousand times better than all the garbage attempts at source control that came before.

Click through for Adron’s advice. Feature branches start making since once you have more than 2 or maybe 3 developers working in the same repo.

Leave a Comment

Partitioning in PostgreSQL

Umair Shahid takes us into partitioning strategies in PostgreSQL:

My recommended methodology for performance improvement of PostgreSQL starts with query optimization. The second step is architectural improvements, part of which is the partitioning of large tables.

Partitioning in PostgreSQL is one of those advanced features that can be a powerful performance booster. If your PostgreSQL tables are becoming very large and sluggish, partitioning might be the cure. 

It’s interesting to compare this against SQL Server, where partitioning is not a strategy for query performance improvements.

Leave a Comment

Shortcut Caching in Microsoft Fabric now GA

Trevor Olson announces a feature has become generally available:

Shortcuts in OneLake allow you to quickly and easily source data from external cloud providers and use it across all Fabric workloads such as Power BI reports, SQL, Spark and Kusto.  However, each time these workloads read data from cross-cloud sources, the source provider (AWS, GCP) charges additional egress fees on the data. Thankfully, shortcut caching allows the data to only be sourced once and then used across all Fabric workloads without additional egress fees.

This is useful for data that hardly ever changes, and Trevor also shows you who can control the cache length and reset the cache. In addition, the on-premises gateway for shortcuts is now generally available, so you can take shortcuts of certain on-prem file systems.

Leave a Comment

Set-Based Comparisons for Data Validation

Jeffry Schwartz looks for exceptions:

Given the complexity, I realized that validating all intermediate and final result sets was essential to ensure that tuning changes did not alter any report results.  To support this validation, I saved interim and final result sets into tables for direct comparison.

For these comparisons, the EXCEPT and INTERSECT operators proved invaluable. 

Click through for the full story. I’ve always liked using these set operations in ETL jobs because they automatically know how to handle NULL, so this approach is more robust than rigging your own comparisons.

Leave a Comment

sudo in Windows

Patrick Gruenauer elevates our access:

Sudo for Windows is a new way for users to execute commands with elevated privileges (as an administrator) directly from a non-relevant console session on Windows.

The following requirements apply to the use of sudo in Windows:

  1. Windows 11 24H2
  2. Sudo needs to be enabled

Click through to see how to activate sudo. The English-language header reads “System > For Developers” and the exact setting is at the bottom of the first section and has the name “Enable sudo” with a toggle switch. The number of times I’ve run a command just to see it error out because I needed to be in an administrative command prompt or PowerShell terminal is high enough that I immediately turned it on.

But importantly, this is different from Linux, in that it opens up a new command prompt or PowerShell terminal rather than executing the command with elevated permissions in the same prompt. This is important because that new prompt goes away after the command finishes, so you lose the output. In other words, if you run sudo ipconfig in a command prompt, it will hit you with a UAC request (depending on how you’ve configured your PC) and then run ipconfig in a new command prompt, which disappears as soon as the command finishes. You don’t get to keep what was in stdout. I think this limits some of the capability of the option, unfortunately.

Leave a Comment