Press "Enter" to skip to content

Curated SQL Posts

Data Extraction from Unstructured Data with Fabric AI Functions

Sandeep Pawar demonstrates functionality:

Most enterprise data lives in free text – tickets, contracts, feedback, clinical notes, and more. It holds critical information but doesn’t fit into the structured tables that pipelines expect. Traditionally, extracting structure meant rule-based parsers that break with every format to change, or custom NLP models that take weeks to build. LLMs opened new possibilities, but on their own they bring inconsistent outputs, no type of enforcement, and results that vary between runs. What production workflows need is LLM intelligence with structured-output guarantees, delivered inside the data platform teams already use.

Microsoft Fabric AI Functions deliver exactly that. Functions like ai.summarize, ai.classify, ai.translate, and ai.extract let you transform and enrich unstructured data at scale with a single line of code – no model deployment or ML infrastructure needed. For the full list, see Transform and enrich data with AI functions.

Click through for an example. The tricky part of this is, because answers won’t be deterministic, you have to do a lot of testing and verification to ensure things are working reasonably well.

Leave a Comment

NTILE and Uneven Row Distribution

Jared Westover clarifies:

One of the simplest yet least-popular ranking functions in T-SQL is NTILE. It’s useful for dividing data into buckets or tiles. However, when your data isn’t evenly distributed across buckets, the results are confusing. Also, NTILE sometimes returns rows in a seemingly random order. What’s happening here?

There’s absolutely a pattern to how NTILE() works, as Jared describes.

Leave a Comment

UDFs to Support the Like-for-Like Pattern in DAX

Marco Russo and Alberto Ferrari support a pattern:

DAX user-defined functions (UDFs) are a powerful tool for improving the quality of your semantic models. DAX authors with an IT background are accustomed to creating generic code using functions. However, many DAX creators came from different backgrounds of expertise, such as statistics, business, and marketing. They may not recognize the immense power that functions have brought to the Power BI community.

In this article, we want to practically show, through an example, how to wisely use functions to improve the generalization of code and to reduce the complexity of your semantic models, with the goal of raising curiosity towards user-defined functions and – in general – the world of code development.

Read on for an example, as well as a link to the like-for-like pattern and what it means.

Leave a Comment

Logging and Error Handling with MicrosoftFabricMgmt

Rob Sewell continues a series on the MicrosoftFabricMgmt Powershell module. First up is structured logging:

If you have ever come back to a script the next morning and thought “what on earth happened last night?”, you understand why logging matters. Write-Host and Write-Verbose are fine for interactive use, but in automation — scheduled tasks, CI/CD pipelines, long-running jobs — you need something more structured. Something you can query, filter, and persist across sessions, something that you can provide to your team or support or auditors.

MicrosoftFabricMgmt uses PSFramework for all its internal logging, and that capability is available directly to you.

Then comes retry logic and dealing with long-running operations:

REST APIs fail. Networks are unreliable. Cloud services have rate limits. If your automation script does not account for this, it will eventually break at the worst possible moment. This is not pessimism — it is production experience.

MicrosoftFabricMgmt has a lot of error handling built in, so you do not have to write it all yourself. Today I want to show you what the module handles automatically, and how to add your own handling on top for the scenarios you care about.

Leave a Comment

An Introduction to pg_duckpipe

Yuwei Xiao needs a way to move data:

When we released pg_ducklake, it brought a columnar lakehouse storage layer to PostgreSQL: DuckDB-powered analytical tables backed by Parquet, with metadata living in PostgreSQL’s own catalog. One question kept coming up: how do I keep these analytical tables in sync with my transactional tables automatically?

This is a real problem. If you manage DuckLake tables by hand, running periodic ETL jobs or batch inserts, you end up with stale data, extra scripts to maintain, and an operational surface area that grows with every table. For teams that want fresh analytical views of their OLTP data, this quickly becomes painful.

pg_duckpipe addresses this. It is a PostgreSQL extension (and optionally a standalone daemon) that streams changes from regular heap tables into DuckLake columnar tables in real time. No Kafka, no Debezium, no external orchestrator. Just PostgreSQL.

Click through to learn more about how it works.

Leave a Comment

The Transaction Log as a Circle

Paul Randal explains to the officer that the transaction log is a flat circle:

In the second part of this series, I described the structural hierarchy of the transaction log. As this post is chiefly concerned with the Virtual Log Files (VLFs) I described, I recommend you read the second part before continuing.

When all is well, the transaction log will endlessly loop, reusing the existing VLFs. This behavior is what I call the circular nature of the log. Sometimes, however, something will happen to prevent this, and the transaction log grows and grows, adding more and more VLFs. In this post, I’ll explain how all this works, or sometimes doesn’t.

Read on for a depiction of the transaction log and what can cause it to foul up.

Leave a Comment

What’s New for tempdb in SQL Server 2025

Johan Deardurff lists the updates:

TempDB has consistently been regarded as one of the most essential and, historically, overlooked components within SQL Server. When TempDB goes wrong, it rarely fails quietly. A single runaway query, poorly designed report, or unexpected workload spike can consume TempDB space and bring an entire instance to its knees.

With SQL Server 2025, Microsoft has made meaningful investments to change that story. TempDB is no longer just something you monitor and hope behaves; it’s now something you can govern, recover quickly from, and observe with far greater clarity.

Click through for a breakdown of how Microsoft has tackled some classic tempdb problems in SQL Server 2025.

Leave a Comment

Deploy Microsoft Fabric Items with fabric-cicd in Azure DevOps

Kevin Chant announces a new Azure DevOps extension:

This post covers how you can simplify Microsoft Fabric deployments with “Deploy Microsoft Fabric items with fabric-cicd”. Which is an Azure DevOps extension that I recently published.

To manage expectations, this post shows how to start working with the extension and its associated task within the GUI-based classic release pipelines in Azure DevOps. Like in the below screenshot.

Read on to see how the extension works.

Leave a Comment

PostgreSQL Business Continuity as Layers

Umair Shahid explains that it’s like an onion:

High availability for PostgreSQL is often treated as a single, big, dramatic decision: “Are we doing HA or not?”

That framing pushes teams into two extremes:

  • a “hero architecture” that costs a lot and still feels tense to operate, or
  • a minimalistic architecture that everyone hopes will just keep running.

A calmer way to design this is to treat HA and DR as layers. You start with a baseline, then add specific capabilities only when your RPO/RTO and budget justify them.

My thing I would point out is that the first few layers are actually disaster recovery, and that high availability first comes into the picture with Layer 3. But if you think of it in terms of Business Continuity (High Availability + Disaster Recovery), then the approach is a good one.

Leave a Comment

Data Type Precedence in SQL Server

Louis Davidson has a type:

There is one topic in query and equation writing that is constantly a minor issue for SQL programmers: implicit data type conversions. Whenever you don’t specifically state the datatype of an expression, like when you write SELECT 1;, it can feel a bit of a mystery what the datatypes of your literal values are. Like in this case, what is 1 ? You probably know from experience that this is an integer, but then what happens when you compare CAST(1 as bit) to the literal 1. Is that literal 1 now a bit? Or is it still an integer?

Perhaps even more importantly, why does this query succeed?

Click through to learn more.

Leave a Comment