Press "Enter" to skip to content

Category: ETL / ELT

Updates to Fabric Data Factory

Abhishek Narain has a list of updates:

Workspace Private Link Support for Data Factory (Preview): Microsoft Fabric enables secure data integration through Private Link support in Dataflows Gen2, Pipelines, and Copy jobs. This ensures that inbound data access remains isolated and compliant within protected workspaces. By leveraging VNet data gateways, organizations can securely connect to data sources across Private Link-enabled environments—eliminating exposure to public networks and reinforcing enterprise-grade security for sensitive data operations.

Most of these are security-related updates, with a mixture of things now GA, things currently in preview, and a pair of items coming soon.

Leave a Comment

Microsoft Fabric Copy Job Updates

Ye Xu has an update:

Copy job is the go-to solution in Microsoft Fabric Data Factory for simplified data movement. With native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication, Copy job offers the flexibility to handle a wide range of scenarios—all through an intuitive, easy-to-use experience.

This update introduces several enhancements, including connection parameterization, expanded CDC capabilities, new connectors, and a streamlined Copy Assistant powered by Copy job.

Read on to see what’s new. Some of the items in this list are preview features, and it looks like others are currently GA.

Leave a Comment

Troubleshooting Bulk Insertion in SQL Server

Rick Dobson lays out some common issues:

Most SQL bulk insert and SQL Server openrowset tutorials skip file access issues that can stop imports cold. Both the bulk insert statement and openrowset function rely on the SQL Server service account to read a source file. The SQL Server service account must have read permission on the file or its folder. It is also convenient to have read & execute as well as list folder content permissions. Also, non-standard source file locations (e.g., C:\Users\Public\Downloads) may not grant default read access to the SQL Server service account – always verify before use.

Click through for several recommendations, links to additional resources, and a few scripts along the way.

Leave a Comment

Comparing Microsoft Fabric Consumption for Notebooks and Warehouse SQL Queries

Gilbert Quevauvilliers performs a comparison:

I saw that there was an update where it is now possible to use the Microsoft Fabric Warehouse to copy data directly from OneLake into the Warehouse.

This got me thinking, which would consume more capacity to get the data into the Warehouse table. As well as which one would be faster.

To do this I am going to be running a SQL query in the Warehouse.

Next, I will use a Notebook to copy the data from the OneLake files section to a Warehouse table.

Gilbert’s specific query involves loading data from a variety of CSV files into a lakehouse via notebook, and then into a warehouse table. Read on for the results.

Comments closed

Dataflows Gen2 Tips and Tricks

Jon Vöge provides advice on the least beloved ELT process:

Dataflows Gen2 are frequently (and often rightfully so) bashed for their performance inefficiencies. Especially in comparison with other ingestion and transformation tools in Fabric (Notebooks, Pipelines, Copy Jobs, SPROCs).

The fact remains however, that in the hands of a self-service developer, they are an incredibly powerful tool – if you can spare the compute on your capacity.

In this article, I will highlight tips and tricks to make the most of working with Dataflow Gen2 in Fabric. The list is by no means exhaustive, but simply consists of a bunch of tips which I found useful in the past year, including new and overlooked features, as well as old best practices:

Read on for some things that are new to Dataflows Gen2, working with SharePoint, and making data loads not quite as slow.

Comments closed

Materializing Lake Views in Microsoft Fabric

Sairam Yeturi reduces ETL and ELT requirements:

Organizations often face challenges when trying to scale analytics across large volumes of data stored in centralized SQL databases. As business teams demand faster, more tailored insights, traditional reporting pipelines can become bottlenecks. By adopting Lakehouse architecture with Microsoft Fabric, business groups can mirror their SQL data into OneLake and organize it using the Medallion architecture—Bronze, Silver, and Gold layers. Materialized lake views play a crucial role in this setup, enabling automated, declarative transformations that clean and enrich data in the Silver layer. This empowers teams to build reliable dashboards and AI-driven insights on top of curated data, all while maintaining performance, governance, and security on a scale.

In this post, we will cover how enterprises can use materialized lake views to streamline data orchestration and enhance data quality, monitoring across silver and gold layers, while mirroring their SQL DB tables to Fabric in the Bronze layer.

The best use case for this is a scenario in which your underlying data is already essentially in a star schema or at least easily transformable into one, and you have no interest in modifying the data in the view directly. Do read the limitations before digging in, though, as there are some big ones.

Comments closed

Microsoft Fabric Pipeline Copy Job Activity in Preview

Connie Xu makes an announcement:

We’re thrilled to announce that the Copy job Activity is now in Preview! 

This new orchestration activity brings the simplicity of the Copy job item directly into your Microsoft Fabric Data Factory pipelines, enabling you to manage data movement alongside transformations, notifications, and more; all in one place. 

Read on to learn more about it, including how it differs from the Copy activity and the Copy job item.

Comments closed

Scheduling Copy Jobs in Microsoft Fabric

Ye Xu can run more than once:

Copy Job is the go-to solution in Microsoft Fabric Data Factory for simplified data movement. With native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication, Copy job offers the flexibility to handle a wide range of scenarios—all through an intuitive, easy-to-use experience.

In this update, we’re excited to announce a powerful new enhancement: multiple scheduler support. This gives you even greater control over when your data moves.

Click through for a screenshot showing how you can set up multiple schedules for a specific copy job. Based on the screenshot, it seems that there is a limit to the number of schedules you can create, though that number (20) is large enough that I couldn’t imagine it being a major impediment for most people.

Comments closed

Loading Data from Network-Protected Storage Accounts into OneLake

Matt Basile grabs some data:

AzCopy is a powerful and performant tool for copying data between Azure Storage and Microsoft OneLake, and is the preferred tool for large-scale data movement due to its ease of use and built-in performance optimizations. AzCopy now supports copying data from firewall-enabled Azure Storage accounts into OneLake using trusted workspace access. Now you can use AzCopy to load data from even network-protected storage accounts, letting you effortlessly load data into OneLake without compromising on security or performance.

Click through for an explanation of trusted workspace access, followed by the steps to try it out for yourself.

Comments closed

Tips for Solving SSIS Package Bottlenecks

Andy Brownsword has some advice:

Last time out we started to look at optimising SSIS packages by showing how to identify bottlenecks with a handy script. This time we’re turning insights into action to solve those pain points.

The solutions are grouped into 3 areas: Data Flows, as they do a lot of heavy lifting; the Execute SQL task, which can also be used for transformation and calculations; and finally everything else (because the first two are usually the issue).

Andy has some good advice and plenty of links to prior content around optimizing SSIS performance. One small thing I’d add is architectural: think about whether you can solve the slow part inside SQL Server. If you’re grabbing a huge amount of data from a SQL Server instance and then narrowing it down with filters, it might be a lot faster to transform that into a SQL query with a stronger WHERE clause. But let’s say there’s some small file you’re using to filter, so you need to pull all of the data out of SQL Server to compare against the small file so that you know what you need. Instead of pulling all of the data out of SQL Server or setting up a Lookup component to hit the SQL Server instance for each row in the file, how about loading that file into SQL Server and then writing a query to do the work?

In short, the database engine is typically going to be a much better at performance than an integration layer would be.

Comments closed