Press "Enter" to skip to content

Month: September 2024

Self-Joins and Halloween Protection

Paul White has an explanation:

I was asked recently why Halloween Protection was needed for data modification statements that include a self-join of the target table. This gives me a chance to explain, while also covering some interesting product bug history from the SQL Server 7 and 2000 days.

Read on for that explanation.

Paul’s explanation of the bug reminded me of the “quirky update” approach to building a running total, except that, instead of fixing a bug that eliminated it, the process always remained on a knife’s edge of “unsupported but works…at least until we change something and it doesn’t work anymore.”

Comments closed

A Quick Reference Guide for Power BI

Hristo Hristov tabulates:

I need a structured reference guide to help me get started or expand on my Power BI knowledge. I want to be able to bookmark a resource and use it daily when needed as I build my data sets, reports, and dashboards. Can you please enumerate some common and helpful resources as a Power BI Quick Reference guide?

Click through for plenty of links to prior MSSQLTips articles.

Comments closed

Natural Language Pre-Processing with Python

Harris Amjad does some text cleanup:

Natural Language Processing (NLP) is currently all the rage in the current machine learning landscape. With technologies like ChatGPT, Gemini, Llama, and so many other state-of-the-art text generators getting popular with the mainstream public, many newcomers are pouring into the field of NLP. Unfortunately, before we delve into how these fancy chatbots work, we must understand how we are engineering and treating our data before we feed it to our model. In this tip, we will introduce and implement some basic text preprocessing and cleaning techniques with Python.

Click through for some common operations. Some of these are very important for certain tasks but likely unhelpful for others. That could include things like lower-casing all words or removing stopwords. There are also some operations like spell checking and jargon expansion (or replacement) that you will likely want to include in a real-life project with actual people entering the data, versus a tidy sample dataset.

Comments closed

Creating Profiles in Visual Studio Code and Azure Data Studio

I have a new video:

In this video, I show off a not-so-well-known capability in Visual Studio Code and Azure Data Studio: creating profiles.

Profiles are very useful in Visual Studio Code, though probably less useful for Azure Data Studio. I think the primary benefit to that would be handling things like zoom levels and menu layouts when you switch from a laptop on the go to something plugged into a larger monitor.

Comments closed

Data Analysis with Window Functions in Postgres

Elizabeth Christensen dives into window functions:

SQL makes sense when it’s working on a single row, or even when it’s aggregating across multiple rows. But what happens when you want to compare between rows of something you’ve already calculated? Or make groups of data and query those? Enter window functions.

Window functions tend to confuse people – but they’re a pretty awesome tool in SQL for data analytics. The best part is that you don’t need charts, fancy BI tools or AI to get some actionable and useful data for your stakeholders.

Read on for several demonstrations. Most of this you can also do with SQL Server 2012 or later, though the DATE_TRUNC() example will only work in SQL Server 2022 or Azure SQL DB / Managed Instance. Prior to that, you’d need to use a different mechanism, such as CAST(o.order_date AS DATE), to get it working.

Comments closed

Automated Patching in SQL Server on Azure

Abdullah Mamun closes a door and opens a window:

The automated patching feature for SQL Server on Azure VM will be retired on September 17th, 2027.

While Automated Patching is being retired, we’re excited to replace it with Azure Update Manager. Azure Update Manager is an enterprise class powerful tool that provides:

  1. Centralize Update Management: Azure Update Manager provides a unified dashboard where you can view and manage updates across your entire environment, including virtual machines, on-premises servers, and even hybrid scenarios.
  2. Customize Schedules: With Azure Update Manager, you can create custom update schedules based on your organization’s needs. Whether you prefer weekly, monthly, or specific date-based updates, Azure Update Manager has you covered.
  3. Patch Compliance Reports: Azure Update Manager generates detailed reports on patch compliance, helping you stay informed about the status of updates across your infrastructure.

Click through to learn more about the transition and next steps.

Comments closed

Syncing Slicers in Power BI without Bi-Directional Relationships

Marc Lelijveld avoids the dreaded bi-directional relationship:

Have you ever wanted to sync two slicers on a report page? Even when both slicers are coming from different dimensions? A lot of users end up setting the relationships to bi-directional (both) which has huge side effects! You may up with a ambiguous data model, over filtering fact tables and wrong results. Also, there is a very likely performance impact to this solution.

But actually, to make the slicers sync, you don’t have to change the relationships! In this blog I will show you how you can sync two (or more) slicers on a report page without changing the relationships or the semantic model!

Read on to learn how.

Comments closed

Simple Data Cleanup with Pandas

Ivan Palomares Carrascosa builds a process:

Few data science projects are exempt from the necessity of cleaning data. Data cleaning encompasses the initial steps of preparing data. Its specific purpose is that only the relevant and useful information underlying the data is retained, be it for its posterior analysis, to use as inputs to an AI or machine learning model, and so on. Unifying or converting data types, dealing with missing values, eliminating noisy values stemming from erroneous measurements, and removing duplicates are some examples of typical processes within the data cleaning stage.

As you might think, the more complex the data, the more intricate, tedious, and time-consuming the data cleaning can become, especially when implementing it manually.

Ivan handles some of the most common types of data clean work and shows a simple way of implementing these.

Comments closed