Press "Enter" to skip to content

Category: Microsoft Fabric

Using Workspace Folders in Microsoft Fabric

Jon Vöge walks through a few folder strategies in Microsoft Fabric:

Every time I see a new Fabric Data Platform, I see a new way of using folders. Almost.

Ranging from no folders at all, to using folders to segregate item types, over folders for bronze/silver/gold layers, to even seeing setups of them being used for DEV/TEST/PROD.

I won’t claim all of these to be equally good. But the point is that there are many different approaches you may take.

Read on for a few strategies, including ones Jon would recommend avoiding.

Comments closed

Optimized Compaction in Microsoft Fabric Spark

Miles Cole crunches things down:

Compaction is one the most necessary but also challenging aspects of managing a Lakehouse architecture. Similar to file systems and even relational databases, unless closely managed, data will get fragmented over time, and can lead to excessive compute costs. The OPTIMIZE command exists to solve for this challenge: small files are grouped into bins targeting a specific ideal file size and then rewritten to blob storage. The result is the same data, but contained in fewer files that are larger.

However, imagine this scenario: you have a nightly OPTIMIZE job which runs to keep your tables, all under 1GB, nicely compacted. Upon inspection of the Delta table transaction log, you find that most of your data is being rewritten after every ELT cycle, leading to expensive OPTIMIZE jobs, even though you are only changing a small portion of the overall data every night. Meanwhile, as business requirements lead to more frequent Delta table updates, in between ELT cycles, it appears that jobs get slower and slower until the next scheduled OPTIMIZE job is run. Sound familiar?

Read on to see what’s new and how you can enable it in your Fabric workspace.

Comments closed

Microsoft Fabric Spark Connector for SQL Databases

Arshad Ali makes an announcement:

Fabric Spark connector for SQL databases (Azure SQL databases, Azure SQL Managed Instances, Fabric SQL databases and SQL Server in Azure VM) in the Fabric Spark runtime is now available. This connector enables Spark developers and data scientists to access and work with data from SQL database engines using a simplified Spark API. The connector will be included as a default library within the Fabric Runtime, eliminating the need for separate installation.

This is a preview feature and works with Scala and Python code against SQL Server-ish databases in Azure (Azure SQL DB, Azure SQL Managed Instance, and virtual machines running SQL Server in Azure).

Comments closed

Lessons Learned from Replicating BigQuery to Microsoft Fabric

Teo Lachev shares some knowledge:

A recent engagement required replicating some DW tables from Google BigQuery to a Fabric Lakehouse. We considered the Fabric mirroring feature (back then in private preview, now publicly available) and learned some lessons along the way:

1. 400 Error during replication configuration – Caused by attempting to use a read-only GBQ dataset that is linked to another GBQ dataset but the link was broken.

Read on for additional tips, including a major one around permissions.

Comments closed

Performance of User-Defined Functions in Fabric Warehouses

Jared Westover shares some findings:

In Part One, we saw that simple scalar user-defined functions (UDFs) perform as well as inline code in a Fabric warehouse. But with a more complex UDF, does performance change? If it drops, is the code-reuse convenience worth the price?

I’m surprised that the performance profile was so good. I had assumed it would perform like T-SQL user-defined functions—namely, worse in general.

Comments closed

Tracking Memory Consumption in Fabric SQL Database

Lance Wright tracks memory utilization:

SQL Database in Fabric continues its commitment to providing you with robust tools for database management, performance monitoring, and optimization. Earlier this year, we released a performance dashboard to help you monitor and improve the performance of your SQL Database in Fabric. We’ve improved upon those performance monitoring capabilities with the ability to track memory consumption. This new capability delivers real-time, actionable data regarding the memory utilization of all database queries to help you make more informed decisions and manage SQL Database resources more efficiently.

Read on to see what you can do with this.

Comments closed

Microsoft Fabric Copy Job Updates

Ye Xu has an update:

Copy job is the go-to solution in Microsoft Fabric Data Factory for simplified data movement. With native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication, Copy job offers the flexibility to handle a wide range of scenarios—all through an intuitive, easy-to-use experience.

This update introduces several enhancements, including connection parameterization, expanded CDC capabilities, new connectors, and a streamlined Copy Assistant powered by Copy job.

Read on to see what’s new. Some of the items in this list are preview features, and it looks like others are currently GA.

Comments closed

Linking Fabric Warehouse SQL Query Results to the Capacity Metrics App

Chris Webb follows up on a previous post:

Following on from my post two weeks ago about how to get the details of Power BI operations seen in the Capacity Metrics App using the OperationId column on the Timepoint Detail page, I thought it was important to point out that you can do the same thing with TSQL queries against a Fabric Warehouse/SQL Endpoint and with Spark jobs. These two areas of Fabric are outside my area of expertise so please excuse any mistakes or simplifications, but I know a lot of you are Fabric capacity admins so I hope you’ll find this useful.

Read on to learn more.

Comments closed

Fabric Mirroring for Azure SQL MI Now GA

Ajay Jagannathan announces a feature has gone to general availability:

Mirroring in Fabric is a powerful feature that allows you to replicate data from various data sources such as your Azure SQL Managed Instance to Fabric’s OneLake. This ensures that your data is always up-to-date and readily available for advanced analytics, AI, and data science without the need for complex ETL processes.

Jokes about Azure SQL Managed Instance aside, it’s good that these features are becoming generally available.

Comments closed

Content Discovery in Microsoft Fabric

Jon Vöge wants to find data:

So you built a nice Data Platform on Microsoft Fabric. Users are happily using a few Models and Reports, but you face two problems:

  1. Users are not aware of all the other awesome models, reports and even lakehouses that they already have access to, which they should be using.
  2. Users also don’t know anything about the models, report and lakehouses that they don’t have access to, but which could also be useful for them, if they requested access.

For my take on how best to solve this natively in Fabric, read on below.

Read on to see how you can enable content discovery.

Comments closed