Press "Enter" to skip to content

Author: Kevin Feasel

MicrosoftFabricMgmt Powershell Pipeline Operations

Rob Sewell chains together some operations:

Last week I showed you how to work with workspaces — creating, updating, removing, assigning capacities. But we were doing each operation in isolation. Today I want to show you what happens when you connect those operations together using the PowerShell pipeline.

This is one of my favourite aspects of PowerShell and therefore it was imperative that Jess Pomfret B S L and I revamped the module to fully support pipeline operations. Every cmdlet that makes sense in a pipeline is built to work in one.

Click through for some examples of what this means.

Leave a Comment

Diagnosing a Driver Error

Sean Gallardy troubleshoots an error:

The symptoms of this issue were interesting, every so often the instance would just kind of get “stuck” – at least that is how it was described to me. Nothing would work, cancelling queries, attempting to kill queries, submitting anything new, nothing seemed to really do anything except restarting the service. Once the service was restarted, the instance (and AG) would hum along nicely… until some random time later when submitting different queries would just grind to a halt. Fun.

The answer is just as dumb as you’d think. But I won’t spoil the punch line here.

Leave a Comment

A Primer on Apache Iceberg

Brendan Tierney provides an introduction to Apache Iceberg:

Modern data platforms increasingly separate compute from storage, using object stores as durable data lakes while scaling processing engines. Traditional “data lakes” built on Parquet files and Hive-style partitioning have limitations around atomicity, schema evolution, metadata scalability, and multi-engine interoperability. Apache Iceberg addresses these challenges by defining a high-performance table format with transactional guarantees, scalable metadata structures, and engine-agnostic semantics.

Apache Iceberg, an open-source table format that has become the industry standard for data sharing in modern data architectures. Let’s have a look at some of the key features, some of its limitations and a brief look at some of the alternatives.

Brendan explains where Iceberg fits in relation to data formats (e.g., Parquet, ORC, and Avro), as well as competitors like Delta Lake and Hudi.

Leave a Comment

JSONB Data in Postgres and Performance Due to TOAST

Paul Ramsey lays out the facts and the data:

Working with APIs and arrays in the jsonb type has become increasingly popular recently, and storing pieces of application data using jsonb has become a common design pattern.

But why shred a JSON object into rows and columns and then rehydrate it later to send it back to the client?

The answer is efficiency. Postgres is most efficient when working with rows and columns, and hiding data structure inside JSON makes it difficult for the engine to go as fast as it might.

Read on to learn how Postgres manages to store arbitrary-sized JSONB data within the limitations of 8KB pages, and the performance implications of doing so.

Leave a Comment

Pain Points around Direct Lake

Teo Lachev describes a pair of problems:

I’m helping an enterprise client modernize their data analytics estate. As a part of this exercise, a SSAS Multidimensional financial cube must be converted to a Power BI semantic model. The challenge is that business users ask for almost real-time BI during the forecasting period, where a change in the source forecasting system must be quickly propagated to the reporting the layer, so the users don’t sit around waiting to analyze the impact. An important part of this architecture is the Fabric Direct Lake storage to eliminate the refresh latency, but it came up with a couple of gotchas.

Click through for those two problems.

Leave a Comment

An Overview of the Fabric Native Execution Engine

Ankita Victor-Levi introduces a new processing model:

In today’s data landscape, as organizations scale their analytical workloads, the demand for faster, more cost-efficient computation continues to rise. Apache Spark has long been the backbone of largescale data processing with its in‑memory processing and powerful APIs, but today’s workloads demand even better performance.

Microsoft Fabric addresses this challenge with the Native Execution Engine—a vectorized, C++ powered execution layer that accelerates Spark jobs with no code changesreduced runtime, and at no additional compute cost. This blog post will take you behind the scenes to give an overview of how the engine works and how it delivers performance gains while preserving the familiar Spark developer experience users already know and love.

Read on to learn more about its capabilities and current limitations.

Leave a Comment

Building Power BI Reports from the Desktop or Fabric

James Serra clears up some confusion:

If you’re a Power BI report author who’s just getting into Microsoft Fabric, you’ve probably asked the same question I hear over and over: am I supposed to stop using Power BI Desktop now?

It’s a fair question. Power BI Desktop is a Windows app that has traditionally been the place where report authors do everything: get data, transform it, model it, and build the report. Microsoft even describes that “connect, shape/transform, then load” experience as part of how Power BI Desktop works with Power Query.

Fabric changes the feel of that workflow because Power BI is now also a first-class experience in the browser inside the Fabric portal. And that browser experience isn’t just “view and share” anymore. You can edit semantic models in the service, including using Power Query for import models and building reports directly from that same environment.

Read on to see, for a brand new report, which of the two models can make the most sense.

Leave a Comment

Combining UNION and UNION ALL

Greg Low crosses the streams:

Until the other day though, I’d never stopped to think about what happens when you mix the two operations. I certainly wouldn’t write code like that myself but for example, without running the code (or reading further ahead yet), what would you expect the output of the following command to be? (Note: The real code read rows from a table but I’ve mocked it up with a VALUES clause to make it easier to see the outcome).

Read on to see what happens.

Leave a Comment

Connection Pooling in PostgreSQL vs SQL Server

Haripriya Naidu compares two systems:

If you speak SQL Server as your first language, then you might be aware that connections are thread-based by design. That means each session/connection in SQL Server gets a worker thread. That thread is tied to that session from start to finish of execution.
If there are no available threads, new connections wait in queue until threads become available. This is called a thread-based model.

Postgres is different, it uses a process-based model. Every single connection spawns a separate backend OS process and each of it consumes RAM (>5MB per connection).

It’s interesting that the RDBMS that really “needs” connection pooling doesn’t have it built in, whereas the one that doesn’t “need” connection pooling (but can still benefit greatly from it) does.

Leave a Comment

Tracking Typing Speed with R

Tomaz Kastrun is pushing aside Mavis Beacon:

Did you ever wonder how fast and with accuracy your typing is?

For this instance, we will introduce some random pangrams, code samples and random strings sotrted by level of difficulty.

This was kind of fun. I could hit about 80 or so WPM on the easy code examples and about 120 on the pangrams (with consistency between difficulties). Also, “Sphinx of black quartz judge my vow” is a pretty awesome thing to shout at the most opportune time.

Leave a Comment