Press "Enter" to skip to content

Month: March 2025

Vacuuming Delta Tables in Microsoft Fabric

Kenneth Omorodion explains why you sometimes need to bust out the VACUUM:

Efficient data management in Microsoft Fabric is a necessity in maintaining large-scale partitioned Delta tables. In dynamic datasets with frequently generated new files, the need to ensure the removal of stale files becomes very important to prevent storage bloating. In settings with partitioned tables, where data is in a hierarchical structure (e.g., by year, month, day), this can be particularly challenging, and files must be cleaned without disrupting active data. Learn how the VACUUM operation can help optimize delta tables.

Read on to learn more.

Leave a Comment

The Order of Data Conversion and Aggregation Functions

I have a new video:

In this video, I demonstrate how order of operations matters when it comes to casting or converting a data type and performing an aggregation on that result. I’ll use the specific example of converting binary data to a number and show where the fast version might lead you astray.

This is something pretty easy to miss, especially when the code returns fast enough. But over a large enough number of calls, these sorts of things add up, as I note in the video.

Leave a Comment

Working with Where-Object in Powershell

Mike Robbins performs some filtering:

PowerShell 3.0 introduced several notable improvements to its cmdlet library, with one of the most useful features in this release being the simplified syntax for the Where-Object cmdlet. This enhancement made filtering objects more efficient and user-friendly by introducing Property and Value parameters and a switch parameter for every comparison operator. This article explores how these changes work, their usefulness, and how to leverage them in your scripts.

Read on for a bit of history, as well as the options available to you now.

Leave a Comment

Getting Started with GitHub Actions

Kathi Kellenberger takes action:

Shortly before Microsoft acquired GitHub in late 2018, GitHub Actions was released. GitHub Actions is a powerful CI/CD platform that can be used to automate code integration and deployment.

This article series will teach you what you need to know to take advantage of GitHub Actions, especially for deploying database code.

Read on for the first article in the series, which acts as a primer on GitHub Actions.

Leave a Comment

Deploying Assets via Azure DevOps and fabric-cicd into Microsoft Fabric

Kevin Chant pushes some code:

In this post I want to show how you can operationalize fabric-cicd to work with Microsoft Fabric and Azure DevOps. Which I exclusively revealed at Power BI Gebruikersdag over the weekend.

Just so that everybody is aware, fabric-cicd is a Python library that allows you to perform CI/CD of various Microsoft Fabric items into Microsoft Fabric workspaces. At this moment in time there is a limited number of supported item types. However, that list is increasing.

Click through to see what Kevin did and how it worked out.

Leave a Comment

Hash versus Range Partitioning in PostgreSQL

Umair Shahid explains when hash and range partitioning work best:

I have always been a fan of RANGE partitioning using a date/time value in PostgreSQL. This isn’t always possible, however, and I recently came across a scenario where a table had grown large enough that it had to be partitioned, and the only reasonable key to use was a UUID styled identifier.

The goal of this post is to highlight when and why hashing your data across partitions in PostgreSQL might be a better approach.

Click through to learn more about each style of partitioning, as well as when hash partitioning may actually be the better fit.

Leave a Comment

Writing Data into a Microsoft Fabric Lakehouse via Notebook

Stepan Resl writes some code:

Since Lakehouse is one of the key items within Microsoft Fabric, it is important to know how to write data into it in various formats and using different tools. One of the most common tools is notebooks, as they provide great flexibility and speed for development and testing with graphical outputs. In this article, I want to focus primarily on the following types of notebooks:

  • PySpark
  • Python

Click through to see how it works in both notebook types.

Leave a Comment

Solving Linear Equations in SQL Server

Sebastiao Pereira implements a function:

Solving linear equations is essential for solving real-world problems in Science, Engineering, Data Analysis, Machine Learning, Economics, Finance, and other areas. Is it possible to have a tool to solve linear equations directly in SQL Server? We will look at how to create a Gauss-Seidel method function for SQL Server.

This is one way to solve a series of linear equations, and it’s a pretty neat implementation.

Leave a Comment

Thoughts on a Decline in SQL Server Quality

Kendra Little shares her opinion:

“Is it just me, or is SQL Server quality slipping?”

I asked myself that question for couple/few years until I faced up to it: SQL Server is well into a period where Microsoft investment is waning, and Microsoft regularly isn’t able to deliver the features they promise.

Read on for Kendra’s take. I’m not in total agreement with this, though part of that is because my stance is closer to (if you’ll allow me the misappropriation of a famous quotation), there is a great deal of ruin in a data platform product.

But if I had a single line of demarcation to Kendra’s point, I’d probably make it when SQL Server QA shifted to “We’ll find all of the bugs in Azure SQL DB first, so we don’t need this redundancy” several years back. That doesn’t explain everything, but it does provide a relevant timeframe for all of this.

1 Comment

Retrieving Microsoft Fabric Items using a Python-Only Notebook

Gilbert Quevauvilliers doesn’t need Spark for this:

This blog below explains how to use a Python only notebook to get all the Fabric items using the Fabric REST API.

NOTE: At the time of this blog post Feb 2025, Dataflow Gen2 is not included in the Fabric items, I am sure it will be there in the future.

NOTE II: This only gets the Fabric Items, which does not include the Power BI Items.

Despite the notes, Gilbert leads off with the main reason why you might want to use this: it takes up approximately 5% of the capacity units that a Spark-based notebook does to perform the same operation.

Leave a Comment