Press "Enter" to skip to content

Curated SQL Posts

Creating a Custom Shape Map in Power BI

Elena Drakulevska builds a map:

Are you aiming for that WOW effect when your client opens your report? I’ve found that people often experience an immediate sense of awe when they see a map visual, especially when it’s conditionally formatted to highlight, for example, which country has the highest sales. So, I thought I’d share how you can achieve this and go beyond the built-in shape maps in Power BI. Let’s transform your global data into compelling visual stories!

Click through to learn more about the technique.

Comments closed

Defining the Default Lakehouse for a Fabric Notebook

Sandeep Pawar sets up a default lakehouse:

I wrote a blog post a while ago on mounting a lakehouse (or generally speaking a storage location) to all nodes in a Fabric spark notebook. This allows you to use the File API file path from the mounted lakehouse.

Mounting a lakehouse using mssparkutils.fs.mount() doesn’t define the default lakehouse of a notebook. To do so, you can use the configure magic as below:

Read on for that command, as well as some notes around using it.

Comments closed

Contoso Data Generator v2

Marco Russo announces an updated product:

I am proud to announce the second version of the Contoso Data Generator!

In January 2022, we released the first version of an open-source project to create a sample relational database for semantic models in Power BI and Analysis Services. That version focused on creating a SQL Server database as a starting point for the semantic model.

We invested in a new version to support more scenarios and products! Yes, Power BI is our primary focus, but 90% of our work could have been helpful for other platforms and architectures, so… why not?

Read on to see how you can use this and generate as much data as you want.

Comments closed

Working with Managed Entities in Azure SQL DB

Josephine Bush creates and uses a managed identity:

Benefits of Using Managed Identities and Entra Groups

  • Enhanced Security: Using managed identities eliminates the need to manage credentials, reducing the risk of credential theft.
  • Simplified Management: Entra Groups streamline the management of permissions for multiple users or managed identities, making it easier to apply consistent access policies.
  • Scalability: As your organization grows, you can easily manage access by adding new users or managed identities to Entra Groups without needing to update database permissions individually.

Read on to see how you can create one and what you can do with it.

Comments closed

Changing Distributions and Simpson’s Paradox

Jerry Tuttle describes a paradox:

So you spent hours, or maybe days, cranking out thousands of numbers, you submit it to your boss just at the deadline, your boss quickly peruses your exhibit of numbers, points to a single number and says, “This number doesn’t look right.” Bosses have an uncanny ability to do this.

      Your boss is pointing to something like this: Your company sells property insurance on both personal and commercial properties. The average personal property premium increased 10% in 2024. The average commercial property premium increased 10% in 2024. But you say the combined average property premium decreased 3% in 2024. You realize that negative 3% does not look right.

Although the blog post doesn’t explicitly mention Simpson’s paradox, I’d argue that this is a good example of the idea. H/T R-Bloggers.

Comments closed

Plotting Individual Values and Means of Multiple Groups in R

Ali Oghabian builds a graph:

In this post I show how groupScatterPlot(), function of the rnatoolbox R package can be used for plotting the individual values in several groups together with their mean (or other statistics). I think this is a useful function for plotting grouped data when some groups (or all groups) have few data points ! You may be wondering why to include such function in the rnatoolbox package ?! Well ! I happen to use it quit a bit for plotting expression values of different groups of genes/transcripts in a sample or expression levels of a specific gene/transcript in several sample groups.

Click through for the sample code and output. H/T R-Bloggers.

Comments closed

Comparing Configuration of Two SQL Server Instances

Jana Sattainathan checks the labels on these bottles:

A lot of times, you have nearly identical database servers for an application running in Production, Test and Development but you may notice performance differences between them for the same data/queries that you could not attribute to any reason since CPU, Memory, Disk etc., may all be identical.

This is, strictly speaking, a comparison of configurations rather than data differences, indexing, and the like. Nonetheless, it’s useful to make this sort of comparison just to ensure that your instances have your desired state configuration.

Comments closed

Shared Library Preloading in PostgreSQL

David Wheeler talks pre-loading:

Recently I’ve been trying to figure out when a Postgres extension shared libraries should be preloaded. By “shared libraries” I mean libraries provided or used by Postgres extensions, whether LOADable libraries or CREATE EXTENSION libraries written in C or pgrx. By “preloaded” I mean under what conditions should they be added to one of the Shared Library Preloading variables, especially shared_preload_libraries.

The answer, it turns out, comes very much down to the extension type. Read on for details.

Read on for an interesting discussion of what pre-loading means and the circumstances you should consider along the way.

Comments closed

Microsoft Purview Classifications and Sensitivity Labels

James Serra labels the data:

I see a lot of confusion on how classifications and sensitivity labels work in Microsoft Purview. This blog will help to clear that up, but I first must address the confusion with Purview now that multiple products have been renamed to Microsoft Purview. I decided to use a question-and-answer format that will hopefully clear up the confusion (I was very confused too!):

Purview is a fantastic product. I just wish it cost about 10% as much as it does; then I could heartily recommend it to people.

Comments closed

Finding Multiple Substrings in R

Steven Sanderson is looking for two things:

Hello, fellow R programmers! Today, we’re looking at a practical topic that often comes up when dealing with text data: how to check if a string contains multiple substrings. We’ll cover how to do this in base R, as well as using the stringr and stringi packages. Each approach has its own advantages, so let’s explore them together.

Read on for three separate examples.

Comments closed