Press "Enter" to skip to content

Author: Kevin Feasel

Microsoft Fabric Copy Job Updates

Ye Xu has an update:

Copy job is the go-to solution in Microsoft Fabric Data Factory for simplified data movement. With native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication, Copy job offers the flexibility to handle a wide range of scenarios—all through an intuitive, easy-to-use experience.

This update introduces several enhancements, including connection parameterization, expanded CDC capabilities, new connectors, and a streamlined Copy Assistant powered by Copy job.

Read on to see what’s new. Some of the items in this list are preview features, and it looks like others are currently GA.

Leave a Comment

Linking Fabric Warehouse SQL Query Results to the Capacity Metrics App

Chris Webb follows up on a previous post:

Following on from my post two weeks ago about how to get the details of Power BI operations seen in the Capacity Metrics App using the OperationId column on the Timepoint Detail page, I thought it was important to point out that you can do the same thing with TSQL queries against a Fabric Warehouse/SQL Endpoint and with Spark jobs. These two areas of Fabric are outside my area of expertise so please excuse any mistakes or simplifications, but I know a lot of you are Fabric capacity admins so I hope you’ll find this useful.

Read on to learn more.

Leave a Comment

CONTINUE and BREAK in T-SQL WHILE Loops

Erik Darling tries to count and gets kind of far along. The key part of the video is the use of CONTINUE and BREAK as control logic in WHILE loops. CONTINUE is something I’ve very rarely used, though it’s pretty useful. BREAK, however, is something I’m quite familiar with, especially in conjunction with record counts. This way, you can perform small updates (e.g., update 1000 records) with a filter that tells you which records still need updated, and when @@ROWCOUNT is below 1000, you know you’re done with the updates.

Leave a Comment

Fabric Mirroring for Azure SQL MI Now GA

Ajay Jagannathan announces a feature has gone to general availability:

Mirroring in Fabric is a powerful feature that allows you to replicate data from various data sources such as your Azure SQL Managed Instance to Fabric’s OneLake. This ensures that your data is always up-to-date and readily available for advanced analytics, AI, and data science without the need for complex ETL processes.

Jokes about Azure SQL Managed Instance aside, it’s good that these features are becoming generally available.

Leave a Comment

Splitting GUIDs into Multiple BIGINTs for Columnstore

Forrest McDaniel performs an experiment:

You may have run into issues with GUIDs as clustering keys, but another major problem is in columnstore. Smart people at Microsoft wrote columnstore in a way to take advantage of modern CPU features, but those CPU features don’t play well with datatypes larger than 8 bytes. Which includes GUIDs.

Read on for the demonstration of this, a clever workaround, and the ramifications of splitting GUIDs into two BIGINTs. Full points for cleverness, though like Forrest, I wouldn’t want to use this in production.

Leave a Comment

Content Discovery in Microsoft Fabric

Jon Vöge wants to find data:

So you built a nice Data Platform on Microsoft Fabric. Users are happily using a few Models and Reports, but you face two problems:

  1. Users are not aware of all the other awesome models, reports and even lakehouses that they already have access to, which they should be using.
  2. Users also don’t know anything about the models, report and lakehouses that they don’t have access to, but which could also be useful for them, if they requested access.

For my take on how best to solve this natively in Fabric, read on below.

Read on to see how you can enable content discovery.

Leave a Comment

S3-Compatible Object Storage in SQL Server 2025

Anthony Nocentino updates a guide for SQL Server 2025:

In this blog post, I’ve implemented two example environments for using SQL Server’s S3 object integration. One for backup and restore to S3-compatible object storage and the other for data virtualization using PolyBase connectivity to S3-compatible object storage. This work aims to get you up and running as quickly as possible to work with these new features. I implemented this in Docker Compose since that handles all the implementation and configuration steps for you. The complete code for this is available on my GitHub repo. I’m walking you through the implementation here in this post.

Click through to see the updates Anthony has to his scripts.

Leave a Comment

Inspecting the Postgres Write-Ahead Log

Henrietta Dombrovskaya digs into the write-ahead log:

First, when the users fixed one of the primary suspects jobs, the situation with WAL growth didn’t change. Second, the rate of the growth couldn’t be explained by these suboptimal jobs: the data volumes they were removing and reloading were still magnitudes smaller than the WAL size we were dealing with. Finally, I decided to do what I should have done from the start – to take a look at what exactly was in these super-fast growing WALs.

Read on to learn what Henrietta found. Also check out the comments for some additional context.

Leave a Comment

Scaling On-Prem Vector Search with Ollama and Nginx

Anthony Nocentino solves a problem:

When you call out to an external embedding service from T-SQL via REST over HTTPS, you’re limited by the throughput of that backend. If you’re running a single Ollama instance, you’ll quickly hit a ceiling on how fast you can generate embeddings, especially for large datasets. I recently attended an event and discussed this topic. My first attempt at generating embeddings was for a three-million-row table. I had access to some world-class hardware to generate the embeddings. When I arrived at the lab and initiated the embedding generation process for this dataset, I quickly realized it would take approximately 9 days to complete. Upon closer examination, I found that I was not utilizing the GPUs to their full potential; in fact, I was only using about 15% of one GPU’s capacity. So I started to cook up this concept in my head, and here we are, load balancing embedding generation across multiple instances of ollama to more fully utilize the resources.

Click through for the solution.

Leave a Comment

Generating and Recognizing Hash Collisions in SQL Server

Hugo Kornelis continues a deep dive into hash tables in SQL Server:

Welcome back! In the previous parts, I first showed how a Hash Match (Left Outer Join) can give insight in the order of data in a hash table, and then used that trick to obtain and verify some interesting insights into the internal structure of such a table. It’s now time to see if this same trick can also be used to find hash collisions.

Hugo finds some interesting results along the way, so check it out.

Leave a Comment