Press "Enter" to skip to content

Category: Microsoft Fabric

Resolving Write Conflicts in Microsoft Fabric Data Warehouse

Twinkle Cyril has a conflict:

Fabric Data Warehouse (DW) supports ACID-compliant transactions using standard T-SQL (BEGIN TRANSACTION, COMMIT, ROLLBACK) and uses Snapshot Isolation (SI) as its exclusive concurrency control model. All operations within a transaction are treated atomically—either all succeed or all fail. This ensures that each transaction operates on a consistent snapshot of the data as it existed at the start of the transaction, which means.

Read on to see what this means, as well as what happens when multiple writers interfere with one another and how to avoid these sorts of issues. My Kimball-coded brain says that, if you have a data warehouse, you should have one data loading process. In that case, it’s not easy for the single data loading process to get tripped up on its own.

Leave a Comment

Locks in Microsoft Fabric Data Warehouse

Twinkle Cyril enumerates the lock types in Fabric Data Warehouse:

Fabric DW supports ACID-compliant transactions using standard T-SQL (BEGIN TRANSACTION, COMMIT, ROLLBACK) and enforces snapshot isolation across all operations. Locks in Fabric Data Warehouse are used to manage concurrent access to metadata and data, especially during DDL operations. Here’s how locking works:

Click through for a chart. The locking policy is a lot simpler than what we see in SQL Server and you can see a description of the pros and cons of that simpler approach.

Leave a Comment

JSONL Support in Fabric Data Warehouse and Lakehouse now GA

Jovan Popovic announces that JSONL format support is now generally available in Microsoft Fabric:

The OPENROWSET function that can read JSONL format empowers you to easily read and ingest JSONL files – for example log files, social media streams, machine learning datasets, configuration files, and other semi-structured sources. With the versatile OPENROWSET T-SQL function, you can reference and query JSONL files as if they were tables, eliminating the need for manual parsing or complex transformation steps.

Read on to see examples of how to ingest and use data in the JSON Lines format.

Leave a Comment

Using Workspace Folders in Microsoft Fabric

Jon Vöge walks through a few folder strategies in Microsoft Fabric:

Every time I see a new Fabric Data Platform, I see a new way of using folders. Almost.

Ranging from no folders at all, to using folders to segregate item types, over folders for bronze/silver/gold layers, to even seeing setups of them being used for DEV/TEST/PROD.

I won’t claim all of these to be equally good. But the point is that there are many different approaches you may take.

Read on for a few strategies, including ones Jon would recommend avoiding.

Leave a Comment

Optimized Compaction in Microsoft Fabric Spark

Miles Cole crunches things down:

Compaction is one the most necessary but also challenging aspects of managing a Lakehouse architecture. Similar to file systems and even relational databases, unless closely managed, data will get fragmented over time, and can lead to excessive compute costs. The OPTIMIZE command exists to solve for this challenge: small files are grouped into bins targeting a specific ideal file size and then rewritten to blob storage. The result is the same data, but contained in fewer files that are larger.

However, imagine this scenario: you have a nightly OPTIMIZE job which runs to keep your tables, all under 1GB, nicely compacted. Upon inspection of the Delta table transaction log, you find that most of your data is being rewritten after every ELT cycle, leading to expensive OPTIMIZE jobs, even though you are only changing a small portion of the overall data every night. Meanwhile, as business requirements lead to more frequent Delta table updates, in between ELT cycles, it appears that jobs get slower and slower until the next scheduled OPTIMIZE job is run. Sound familiar?

Read on to see what’s new and how you can enable it in your Fabric workspace.

Leave a Comment

Microsoft Fabric Spark Connector for SQL Databases

Arshad Ali makes an announcement:

Fabric Spark connector for SQL databases (Azure SQL databases, Azure SQL Managed Instances, Fabric SQL databases and SQL Server in Azure VM) in the Fabric Spark runtime is now available. This connector enables Spark developers and data scientists to access and work with data from SQL database engines using a simplified Spark API. The connector will be included as a default library within the Fabric Runtime, eliminating the need for separate installation.

This is a preview feature and works with Scala and Python code against SQL Server-ish databases in Azure (Azure SQL DB, Azure SQL Managed Instance, and virtual machines running SQL Server in Azure).

Leave a Comment

Lessons Learned from Replicating BigQuery to Microsoft Fabric

Teo Lachev shares some knowledge:

A recent engagement required replicating some DW tables from Google BigQuery to a Fabric Lakehouse. We considered the Fabric mirroring feature (back then in private preview, now publicly available) and learned some lessons along the way:

1. 400 Error during replication configuration – Caused by attempting to use a read-only GBQ dataset that is linked to another GBQ dataset but the link was broken.

Read on for additional tips, including a major one around permissions.

Leave a Comment

Performance of User-Defined Functions in Fabric Warehouses

Jared Westover shares some findings:

In Part One, we saw that simple scalar user-defined functions (UDFs) perform as well as inline code in a Fabric warehouse. But with a more complex UDF, does performance change? If it drops, is the code-reuse convenience worth the price?

I’m surprised that the performance profile was so good. I had assumed it would perform like T-SQL user-defined functions—namely, worse in general.

Leave a Comment

Tracking Memory Consumption in Fabric SQL Database

Lance Wright tracks memory utilization:

SQL Database in Fabric continues its commitment to providing you with robust tools for database management, performance monitoring, and optimization. Earlier this year, we released a performance dashboard to help you monitor and improve the performance of your SQL Database in Fabric. We’ve improved upon those performance monitoring capabilities with the ability to track memory consumption. This new capability delivers real-time, actionable data regarding the memory utilization of all database queries to help you make more informed decisions and manage SQL Database resources more efficiently.

Read on to see what you can do with this.

Leave a Comment

Microsoft Fabric Copy Job Updates

Ye Xu has an update:

Copy job is the go-to solution in Microsoft Fabric Data Factory for simplified data movement. With native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication, Copy job offers the flexibility to handle a wide range of scenarios—all through an intuitive, easy-to-use experience.

This update introduces several enhancements, including connection parameterization, expanded CDC capabilities, new connectors, and a streamlined Copy Assistant powered by Copy job.

Read on to see what’s new. Some of the items in this list are preview features, and it looks like others are currently GA.

Leave a Comment