Press "Enter" to skip to content

Author: Kevin Feasel

Three Incremental Load Patterns with Azure Data Factory

Temidayo Omoniyi likes a good pattern:

This article is divided into three major sections—each showing the different abilities and use cases of performing incremental load with Azure Data Factory. This process can also be done in an Azure Synapse Pipeline and Fabric Pipeline.

The document contains the following:

Section 1: Copy Data Based on Last Modified Date or Latest File

Section 2: Incremental Copy Using Dataflow

Section 3: Incremental Copy Using Lookup and Stored Procedure Activities

Click through for each of these three patterns, with plenty of screenshots and step-by-step instructions.

Leave a Comment

Handling Excel Files in OneLake via File Explorer

Kristyna Ferris needs to get a file:

Hey data friends! This blog is to discuss an edge case I’ve run into in Microsoft Fabric. I won’t go into all the context, but the goal was to have an Excel file accessible to Microsoft Fabric without OneDrive, SharePoint, nor an on-premises data gateway. We also didn’t want a csv because we wanted to have multiple tabs and structured tables with formulas which won’t save properly in csv files.

So how did we do it? OneLake!

Read on to learn how.

Leave a Comment

Using the OUTPUT Clause

Erik Darling has a new video. Erik mentions the best use case of this being for archival tables, but I’ll add one more: if you’re using a queue table (ignoring how good or bad of an idea this is), you have multiple processes operating on this queue table, and you want to reduce the likelihood of two processes picking up the same value, you can perform the equivalent of popping off of a queue: delete the first element(s) from the queue table and output it into a temp table. From there, you can operate on that data at your leisure, and the next process will grab some other batch of data. And if everything goes mildly wrong, re-insert that data back into the queue and let some other sucker try it. I’ve used this a few times for data warehousing processes and it works out pretty well.

The only thing I’m unsure about is how he figured out that I’m CommonTableExpressionLover11357.

1 Comment

SSIS Deprecating Microsoft Connector for Oracle

Debbi Lyons has an announcement:

In July 2025, Microsoft will discontinue support for the Microsoft Connector for Oracle in SQL Server Integration Services (SSIS). This blog provides essential details to help customers prepare for this change in advance.

The Microsoft Connector for Oracle enables data export from and import into Oracle databases within an SSIS package. This feature, available in Enterprise editions of SQL Server 2019 and 2022, will remain functional for the lifecycle of the SQL Server product. However, support for this feature will officially end on July 4, 2025. With the deprecation, future product releases will provide no further bug fixes. Additionally, it will not be supported from SQL Server 2025 and onwards.

See, and people have told us there hasn’t been anything happening in SSIS since 2016!

The alternative of using ADO.NET reminds me of when Microsoft tried to take OLEDB out of Integration Services and got the pushback that no, we really don’t want to move from a fast component to a slow component. I would expect much less pushback on this one, simply because I doubt many people are using SSIS to ferry around data in Oracle.

Leave a Comment

Streaming Data to Azure Event Hub via Mockaroo and Kafka API

Jasleen Kaur Wahi generates some data:

In a recent project, I faced the need to generate randomized data for transmission to the Azure Event Hub. This hub is a key component of Microsoft Azure, used for real-time data ingestion and processing.

First, let’s take look at how I created this random data. I wanted to come up with a way to make data that looks like what we see in the real world, but without using any real information from users. This made-up data was really important for a bunch of things, like checking if our software works well.

Read on to see how Mockaroo works and the end result. Creating tests for streaming services like Event Hubs is a challenge, so this is an interesting approach to the task.

Leave a Comment

Transitioning from Elasticsearch to OpenSearch

Nileh Jain has a guide for us:

Elasticsearch and OpenSearch are powerful tools for handling search and analytics workloads, offering scalability, real-time capabilities, and a rich ecosystem of plugins and integrations. Elasticsearch is widely used for full-text search, log monitoring, and data visualization across industries due to its mature ecosystem. OpenSearch, a community-driven fork of Elasticsearch, provides a fully open-source alternative with many of the same capabilities, making it an excellent choice for organizations prioritizing open-source principles and cost efficiency. 

Migration to OpenSearch should be considered if you are using Elasticsearch versions up to 7.10 and want to avoid licensing restrictions introduced with Elasticsearch’s SSPL license. It is also ideal for those seeking continued access to an open-source ecosystem while maintaining compatibility with existing Elasticsearch APIs and tools. Organizations with a focus on community-driven innovation, transparent governance, or cost control will find OpenSearch a compelling option.

Click through for the prep work and the guide.

Leave a Comment

A Quick Primer on KQL

Reitse Eskens takes us through a language:

This post can come as a shock if you’re used to writing T-SQL. Because not only is there more than one useful language to process data, realtime data in this case, but it also has enough similarities to SQL to look familiar and is different enough to leave you flustered.

Now, to get a complete introduction into KQL or the Kusto Query Language, one blogpost (or video) would never be enough. There are so many operators that can fill an entire series on their own.

In this blog, the focus will be on the basic structure of KQL and a number of common operators. They will be compared with the counterparts in SQL for reference.

I fully agree with Reitse. I’ve put together a full-length talk on KQL and still feel like I’m covering the basics. It’s not that KQL is some monstrously complicated language, but it is different enough from other languages like T-SQL that you cannot easily apply knowledge from one to the other.

Leave a Comment

Trusted Servers for Power BI TLS Connections

Andy Brownsword works around an issue:

I recently had an issue when sourcing data in Power BI from a server which was accessed by a DNS alias. Here I’ll demonstrate the issue and how to resolve it.

After entering the server details, we could be greeted with the message below:

The server name provided does not match the server name on the SQL Server SSL Certificate. Please contact your administrator or try changing your Connection encryption settings

Click through for a solution if you cannot re-issue the certificate with the relevant DNS alias.

Leave a Comment

Multi-Column Statistics in PostgreSQL

Hans-Jürgen Schönig creates new statistics:

If you are using PostgreSQL for analytics or large-scale aggregations, you might occasionally notice the planner making false assumptions regarding the number of rows. While this isn’t a problem for small aggregates, it is indeed an issue for large-scale aggregations featuring many different dimensions.

In short: The more columns your GROUP BY statement contains, the more likely it is that optimizer overestimates the row count.

This blog explains how this can be handled in PostgreSQL.

Maybe it’s just me, but I don’t recall many instances in which adding multi-column statistics without any sort of index change significantly improved a query’s performance. I can understand how it could improve things like memory grants, so perhaps that’s how I’m selling it short. But I struggle to recall a specific case in which a query got measurably faster as a result.

Leave a Comment