Press "Enter" to skip to content

Author: Kevin Feasel

Purview Data Loss Prevention in Microsoft Fabric

Yael Biss doesn’t want people walking off with the data:

As data volume and complexity soar, protecting sensitive information has become non-negotiable. With the latest enhancements to Purview Data Loss Prevention (DLP) Policies in Microsoft Fabric, organizations now have the power to proactively secure their data in Onelake.

Whether you’re just getting started or looking to take your data governance to the next level, following proven best practices will maximize your security, compliance, and productivity.

Click through for several tips on how to use Microsoft Purview DLP in Fabric. One of those tips ought to be “Get a side hustle so you can afford both Purview and Fabric.”

Leave a Comment

Setting the Optimal logical_decoding_work_mem in PostgreSQL

Ashutosh Bapat shares a tip with us:

Logical replication is a versatile feature offered in PostgreSQL. I have discussed the the theoretical background of this feature in detail in my POSETTE talk. At the end of the talk, I emphasize the need for monitoring logical replication setup. If you are using logical replication and have setup monitoring you will be familiar with pg_stat_replication_slots. In some cases this view shows high amount of spill_txns, spill_count and spill_bytes, which indicates that the WAL sender corresponding to that replication slot is using high amount of disk space. This increases load on the IO subsystem affecting the performance. It also means that there is less disk available for user data and regular transactions to operate. This is an indication that logical_decoding_work_mem has been configured too low. That’s the subject of this blog: how to decide the right configuration value for logical_decoding_work_mem. Let’s first discuss the purpose of this GUC. Blog might serve as a good background before reading further.

Read on to learn a bit more about how this value works and what you can do to set it correctly.

Leave a Comment

Pain Points around tempdb

Kevin Hill has a list:

TempDB is the SQL Server equivalent of a junk drawer – everyone uses it, nobody monitors it, and eventually it becomes a bottleneck you can’t ignore.

Whether it’s poorly configured from the start or getting hammered by bad execution plans, TempDB often becomes the silent killer of performance. The good news? A few targeted changes can make a big impact.

Read on for some of tempdb’s greatest hits. Kevin also has a few quick tips for tempdb.

Leave a Comment

Estimating Query Percentiles in PostgreSQL

Michael Christofides makes an assertion:

I recently saw a feature request for pg_stat_statements to be able to track percentile performance of queries, for example the p95 (95th percentile) or p99 (99th percentile).

That would be fantastic, but isn’t yet possible. In the meantime, there is a statistically-dodgy-but-practically-useful (my speciality) way to approximate them using the mean and standard deviation columns in pg_stat_statements.

Click through for the code. Michael even covers the immediate objection I have (that the data isn’t normally distributed, so you shouldn’t use the same Z score estimators that exist for the normal). That said, if you’re interested in “p99…ish” then this is a clever approach to take.

Leave a Comment

Securing Mirrored Databricks Data in Fabric

Aaron Merrill has a catalog:

With this update, Azure Mirrored Databricks Catalog items can now be enabled with OneLake security. Security at the table, column, or row level can be defined directly in each item, allowing access to be controlled at a granular level. This allows security to be defined directly over the data mirrored into OneLake so it can be securely used by downstream sources such as lakehouses, notebooks, or semantic models.  

Read on to see what to do in order to make use of this. It is, of course, currently in preview.

Leave a Comment

Creating Fabric Databases via Powershell

Rob Sewell has a trio of posts. First up, Rob creates a warehouse:

We are going to use [FabricTools, a PowerShell module that simplifies the management of Microsoft Fabric resources. You can find it on GitHub at[FabricTools provides functions to create, update, and delete Fabric warehouses and other resources. It has been developed and is maintained by members of the community.

Next up is a lakehouse:

Having created a Microsoft Fabric workspace using PowerShellassigned users to it using PowerShell, and created a Microsoft Fabric Warehouse using PowerShell, we now turn our attention to creating a Microsoft Fabric Lakehouse using PowerShell. These posts all look remarkably similar 🙂 and thats because the functionality they are showing is exactly same..

Just like dbatools and dbachecks creating functions that follow the same pattern makes it easier to learn. This is why FabricTools has chosen to use the same pattern for creating Fabric resources using PowerShell.

And from there is a Fabric SQL Database:

Fabric SQL Database is a SQL Database that is hosted in Microsoft Fabric. It provides a fully managed SQL database service that allows you to store and query your data using SQL using the same SLQ Engine as Azure SQL Database.

Click through for examples of all three.

Leave a Comment

What’s Forthcoming for DBAs in SQL Server 2025

Stephen Planck has a list:

There is a lot to be excited about in SQL Server 2025! When thinking about features that may not get as much attention as others, yet will make a real difference in the lives of DBAs, I have selected my top 5 enhancements for SQL Server 2025. These improvements may not make the headlines, but they address pain points we’ve all experienced as DBAs. Please let me know if I’ve left any of your favorites off the list.

Read on for the list. It’s not a set of fancy new features, though there are some quality of life improvements here.

Leave a Comment

Consumer Group Rebalancing in Kafka and KIP-848

Jonathan Lacefield gives us a heads-up:

Historically, Kafka has relied on what we now call the “classic” rebalance protocol. This protocol evolved, as it was initially dominated by the “eager” assignment strategy. Eager rebalancing worked on a stop-the-world principle: Any change in group membership (consumer joining/leaving) or topic metadata triggered a complete halt. All consumers revoked their partitions, a leader computed a new assignment, and partitions were redistributed before processing could resume. This caused significant downtime, especially in dynamic environments.

To mitigate this, the cooperative assignment strategy was introduced within the classic protocol. Cooperative rebalancing reduced downtime by allowing consumers to keep partitions unaffected by the rebalance, revoking only those needing reassignment.

Read on to learn about some of the challenges that exist with rebalancing, and what KIP-848 promises to do.

Leave a Comment

Simplifying Calculations with the APPLY Operator

I have a new video:

In this video, I show how we can use the APPLY operator to remove redundancy in the SELECT clause and simplify complex calculations, all with zero performance impact.

This is, as I’ve said in the past, my favorite use case for the APPLY operator. As I’ve become older and (even) more crochety, I’ve sided more and more with “easy to read” versus “runs faster” for code. And when you get “easy to read” with no impact on “runs faster,” I’m all in.

The accounting scenario I show may be a fairly extreme case, but I’d bet that queries similar to this abound in almost any company. A big part of why complex reporting queries are so complex comes from repetition of calculations.

Leave a Comment

Populating Microsoft Fabric Data Agents with Semantic Model Synonyms

Marc Lelijveld explains some terms:

It was only yesterday, that I wrote a blog post on Semantic Models as a source for Fabric Data Agents. Not much time has passed, since I learned that Fabric Data Agents does not (always) respect the Synonyms that have been added to a Semantic Model. As a result, the Data Agent may start creating implicit measures, not respecting the definitions and logic in the explicit measures that are part of the Semantic Model.

Long story short, I think we should be able to do better! Therefore, I created a Notebook that helps you to setup Data Agents, collect additional information from your Semantic Model and populate that information automatically as AI notes to the Data Agent.

Read on for the notebook and some additional explanation.

Leave a Comment