Press "Enter" to skip to content

Curated SQL Posts

From Anaconda to Standard Python

Rob Zelt switches to standard Python:

While Anaconda provides a comprehensive package management system, particularly useful for data science, many developers prefer the flexibility and lightweight nature of standard Python environments. This guide will help you make the switch without losing your carefully curated package setup.

Read on for Rob’s solution. I used to be a huge proponent of the Anaconda distribution of Python, but have found myself being less of one, especially with the licensing changes a few years back. If you were already using pip for most package installation, and if you’re fairly consistent about using virtual environments, this transition is even easier than in Rob’s scenario.

Comments closed

Displaying Dates without Data in Power BI

Alon Ohayon looks for the missing month:

When you’re using Power BI, you probably create line charts that show data by month pretty often. It usually works great, but what happens if some months don’t have any data?

By default, Power BI just skips those months in the chart. That might seem okay, but it can actually be misleading—especially if you’re looking for trends over time, including the months with no activity.

Alon shows us an example of how to do this in DAX and that works. But if you can solve this at the data layer, such as when querying from a SQL Server, that’s even better. This happens to be one of the good uses of a calendar table: giving you a complete set of months (or whatever time period you want) that you can then use to left join to your data, returning either a data point with a value, or a NULL that you can coalesce with 0 to ensure that you have a result for each month.

Comments closed

The Importance of Filtering Columns instead of Tables in DAX

Marco Russo and Alberto Ferrari explain it to us:

You have probably heard multiple times the same answer to many questions about DAX and, to be honest, about nearly anything IT-related: “it depends”. Is it fine to create a calculated column, or is it better to avoid it? Yes, no… it depends. Is it better to create one, two, or three fact tables? It depends. Is a composite model the right choice? It depends. However, a few questions have a strong, clear answer, and in this article, we focus on one of those. Is it better to filter a table or a column with CALCULATE? Here, the answer is simple and definite: filter columns, not tables. The same principle is applied to CALCULATETABLE, even though the example in this article only shows CALCULATE.

With that answer in mind, read on for the wherefore.

Comments closed

Creating Test Classes and Unit Tests with tSQLt

Olivier Van Steenlandt continues a series on database testing:

We have set up our tSQLt Database Project in the previous data recipe, Create a SSDT Project Template based on your Database Project. Now it’s time to dive into the wonderful world of tSQLt Unit Testing. In the meantime, I have added my data warehouse to my SSDT Solution and added this project as a Database Reference to my Unit Testing Database Project. If you are unsure how to do this, you can find all the information you need in my previous data cookbook which you can access via the following link: Getting Started With Database Projects & Azure DevOps.

Read on for a walkthrough of how to do this.

Comments closed

Parallelism in Powershell Workflows

Chad Callihan shows one method for parallel execution in Powershell:

I’ve mentioned before that running Copy-DbaDatabase worked well in my experience, but the downside that I ran into was utilizing it for larger counts of databases. It should be no surprise that the more databases to copy, the longer the process takes to complete. I did some more research and came across using workflows along with the parallel keyword to speed up the copy process.

Let’s take a look at what a workflow is and how we can apply it to add parallelism to the database copy process.

I’ve previously covered options for parallelism, though apparently I was wrong about workflows being deprecated. Rod Edwards also has a list of options for doing things in parallel and does not include workflows, so you can get an idea of several of the options available to you.

Comments closed

Tips for Bringing a Streamlit App into Production

I have wrapped up another series:

In this video, I discuss some of the things you should consider as you transition a Streamlit application from development to production. We will cover four methods of bringing a Streamlit app to production and some thoughts on performance optimization.

This one doesn’t have much in the way of demos, but I do spend a lot of time at the virtual whiteboard, so it’s got that going for it.

Comments closed

Enumerating Causes of Dirty and Incomplete Data

Joe Celko builds a list and checks it twice:

Many years ago, my wife and I wrote an article for Datamation, a major trade publication at the time, under the title, “Don’t Warehouse Dirty Data!” It’s been referenced quite a few times over the decades but is nowhere to be found using Google these days. The point is, if you have written a report using data, you have no doubt felt the pain of dirty data and it is nothing new.

However, what we never got around to defining was exactly how data gets dirty. Let’s look at some of the ways data get messed up.

I am very slowly working up the nerve to build a longer talk (and YouTube series) on data engineering, and part of that involves understanding why our data tends to be such a mess. Joe has several examples and stories, and I’m sure we could come up with dozens of other reasons.

Comments closed

Eliminating Unexpected Whitespace or Null Characters in Columns

Kevin Wilkie has fun with an unexpected character:

The query ran just fine so I happily completed my work for the day. I sent it to my QA team and had them check it out before pushing it to Production. They sent it back with a note saying “We expect a lot more “Insert Garbage Data Here” to show.

Being a good data developer – or someone who just wants to show QA up (take your pick) – decided to dig deep into what FieldName really shows. I found a lot of the following:

In Kevin’s case, he tried to use TRIM() and it didn’t fix anything. That’s because TRIM(), by default, only removes the space character (up to the first non-space character), not all whitespace and not the null character CHAR(0) that Kevin found.

An alternative version of TRIM() that would have worked in this case, plus adding in tabstops as well, would be:

SELECT
    TRIM(' '+char(0)+char(9) FROM FieldName) AS TrimmedFieldName
FROM dbo.SomeTable;
GO

And you could also extend that to include newlines, line feeds, vertical tabs, the line separator character, and whatever else you need.

Comments closed

Loading Scanner API JSON Data from Fabric into Power BI Desktop

Gilbert Quevauvilliers hunts for some output files:

How to get data from a Fabric Lakehouse File into Power BI Desktop – Using Scanner API JSON

In this blog post I am going to show you how I connected to my Scanner API JSON file which is stored in the files section of my Microsoft Fabric Lakehouse.

Full credit on how to complete this comes from Marc’s blog post https://data-marc.com/2023/08/25/access-onelake-files-from-power-bi-desktop/

Click through for the instructions.

Comments closed