Press "Enter" to skip to content

Curated SQL Posts

Imputing Future Values with Power BI

Joseph Yeates makes a forecast:

I recently had a requirement for a Power BI report to generate a forecast. The report data model consisted of daily amount information, which was aggregated and summarized in the report for monthly totals.

The data was current year to date, but for the remaining months in the year rather than displaying no data, the users wanted to impute the average for all of the previous months that did have data.

Granted, this doesn’t do anything fancy like factoring in seasonality or periodicity, but it is easy to put together.

Comments closed

Normalizing Data in R

Steven Sanderson says, act normal:

Data normalization is a crucial preprocessing step in data analysis and machine learning workflows. It helps in standardizing the scale of numeric features, ensuring fair treatment to all variables regardless of their magnitude. In this tutorial, we’ll explore how to normalize data in R using practical examples and step-by-step explanations.

Read on for a definition of what this means and how you can do it.

Comments closed

Finding Last Access Dates for SQL Server

David Fowler checks the calendar:

Your boss walks up to you one morning and says, “Hey, I wanna list of all of our databases and when they were last accessed”.

If you’ve got some sort of auditing switched on or a trace or xevent catching this sort of info you might be ok, but I’m betting you don’t have any of that. That’s cool, it’s not something that I tend to monitor as standard either.

But if you’re not monitoring it, is there any way that you can get at that info?

Read on for one way to estimate it. Though I believe automated jobs would skew that result if the underlying question is, “When did a human last view that database?”

Comments closed

Finding Row Counts in SQL Server

Kevin Wilkie breaks out the abacus:

Today, I was working with SQL Server to get row counts from several tables so I thought I’d be smart and work with some functions in SQL Server to make it smarter / easier.

Now, if I am truly only getting “straight” row counts from these tables, I would be able to create a query like the below that would provide the answers with no problem:

Read on for the normal approach, as well as a more complicated approach made necessary due to some business logic requirements.

Comments closed

Date Calculation Bug in Power Query ODBC Code

Meagan Longoria files a report:

I was working on an imported Power BI semantic model, adding some fiscal year calculations to my date table. The date table was sourced from a view in Databricks Unity Catalog. I didn’t have access to add more fields to the view, so I was adding the fields in Power Query first, with plans to request they be added to the view in the future. I got some unexpected results, which turned into a bug being logged for the ODBC code for Power Query.

If you are only analyzing data in the last 20 years, you won’t see this bug. But if you are doing long-term analysis including years before 2000, you might just run into it.

Read on to see the bug, how you can replicate it, and three workarounds you can use to avoid it.

Comments closed

The Value of Mirroring in Microsoft Fabric

Nikola Ilic talks mirroring:

First things first. Before I show you how to leverage this feature in Microsoft Fabric, let’s first explain the feature itself.

But, before we explain the feature itself, we need to go one step back and examine the key logic behind the Microsoft Fabric workloads, so that you understand the full context of the Mirroring importance.

Take that context and then you get an idea of how mirroring becomes so important for the Microsoft Fabric experience.

Comments closed

Dealing with Parameter Sniffing using Multiple Execution Plans

Andy Brownsword deals with statistical skew in the data:

Dynamic SQL has many uses and one of these can help us fix Parameter Sniffing issues. Here we’ll look at how it can be used to generate multiple execution plans for the same query.

Parameter sniffing is a common issue. Even for simple queries we can run into suboptimal plans being produced. There are multiple ways we can use Dynamic SQL to solve this challenge. Here we’ll demonstrate one technique: Comment Injection.

My one note about a good post (other than, you should read it) is that parameter sniffing is not itself a bad thing. 95%+ of the time, it’s a great thing. It’s that last 5% or so that give it a bad name.

Comments closed

Comparing pgvector and Postgres ARRAY

Ernst-Georg Schmid makes a comp based on a mass spectrometry database:

As said in the introduction, mass spectrometry is one, if not the tool to identify unknown compounds, to quantify known compounds, and to determine the structure of molecules. But it is a lot of work, and you need reference spectra to compare against.

So, there are curated databases of validated spectra available, like MassBank JapanMassBank Europe and the NIST mass spectral libraries. Laboratories might also want to store their own libraries for future use.

However, such databases often come in their own formats and with their own retrieval software. If you need to efficiently connect spectra to other data, e.g. chemical structures or genomic data, this calls for central management and a common API.

Read on to see the comparison of the pgvector extension versus built-in functionality with ARRAY.

Comments closed

Workspace Folders in Microsoft Fabric

Koen Verbeeck double-checks the calendar:

That’s right, this is not an April Fool’s Joke! The most anticipated feature of Microsoft Fabric has arrived! I’m not talking about decent CI/CD support, or OneSecurity. Nope, this is all about the ability to create folders in your workspaces! Very important, since Fabric is a centralized SaaS data platform that allows you to create a gazillion different objects, but until now you had now way of actually organizing them.

To give you an idea about how many objects, this is what the filter currently shows (and some items are missing, like Eventhouse):

This is big. Even on a small proof of concept that I worked on, the lack of folders was annoying. On a full project, the pain becomes worse. Granted, it’s in public preview, so it might not be available to everybody right off the bat, but it’s certainly a step in the direction of usefulness.

Comments closed