Press "Enter" to skip to content

Category: ETL / ELT

Azure Synapse Link for SQL Server 2022 and File Analysis

Kevin Chant digs into Azure Synapse Link for SQL Server 2022:

In this post I want to cover some file tests for Azure Synapse Link for SQL Server 2022 that I performed.

Because a while back I spotted something interesting whilst I was doing some initial tests for Azure Synapse Link for SQL Server 2022.

Which is when you add new data after the initial load that a new folder called ‘ChangeData’ appears in the storage account container. I noticed that the new file containing the insert was a comma separated value (csv) file. Whereas the table used for the initial load was a parquet file.

Is there a method to this madness? Click through to see Kevin’s tell-all story.

Comments closed

Adding an Existing Data Factory to GitHub

Andy Leonard has a three-parter for us. Part 1 shows you how to create a GitHub account and repo:

The unabridged topic of source control with github is beyond the scope of this post. There are a number of ways to accomplish the tasks described in this post and series. I welcome your suggestions in the comments.

This post is written to help Azure Data Factory developers get started using github.

Part 2 connects a Data Factory to the repository:

For the purposes of this demo, accept the defaults for “Publish branch” and “Root folder.” Check the “Import existing resources to repository” checkbox under the “Import existing resource” property, select the main branch in the “Import resource into this branch” property, and then click the “Apply” button:

Part 3 handles changes:

Applying what we’ve configured and learned thus far, let’s put this to work in a code-management workflow.

When it’s time to make a change, first create a new branch. I can hear some of you thinking, “Why, Andy? Why create a new branch?” That’s an excellent question. I am so glad you asked! Think of the new branch as a temporary copy of the current state of my Azure Data Factory. 

This series works from the assumption that you don’t have any real experience with Git (or GitHub) for source control, and maybe not much source control experience at all.

Comments closed

Database-Driven Parameterization for Synapse Pipelines

Paul Hernandez does some configuring:

Particularly in Synapse, there are even no global parameters like in Azure Data Factory. 

When you want to move your development to another environment, typically CI/CDs pipelines are used. These pipelines consume an ARM template together with its parameter file to create a workspace in a target environment. The parameters can be overriding in the CD pipeline as explain here: https://techcommunity.microsoft.com/t5/data-architecture-blog/ci-cd-in-azure-synapse-analytics-part-4-the-release-pipeline/ba-p/2034434

Even so, I have not found a proper way to change the values of a pipeline parameter (the same for data flows and datasets parameters). I saw some custom parameters manipulation to set the default value of a parameter and then deploy it without any value, or even JSON manipulation with PowerShell (the dark side for me).

Read on for an alternative solution which does the job well.

Comments closed

Feeding Synapse Spark Info to On-Prem Kafka Clusters

Bhadreshkumar Shiyal finds a solution:

Microsoft’s official documentation for Azure Data Factory contains a tutorial which explains how to access an On-Premises SQL Server from Azure Data Factory which is inside a Managed Vnet. You can go through that article here: Access on-premises SQL Server from Data Factory Managed Vnet using Private Endpoint – Azure Data Fac….

Although based upon the article’s solution, to meet our requirements we needed to substitute On-Prem Apache Kafka for On-Prem SQL Server and instead of an Azure Data Factory inside a Managed Vnet, we used a Synapse Workspace inside a Managed Vnet. The “Forwarding Vnet” concept explained in the above tutorial remains as-is in our approach.

As soon as you turn on Data Exfiltration Protection (DEP), the lockdown is real. Click through to see what the process of exfiltrating data through an approved mechanism looks like.

Comments closed

Argument {0} is Null or Empty in Azure Data Factory

Richard Swinbank diagnoses a problem:

I encountered this error recently while using Azure Data Factory’s (ADF) new(ish) Script activity to run a SQL query. It took me a while to find a fix, and when it happened again two weeks later I’d completely forgotten it. With a bit of luck, writing it down will help me remember next time – and if it helps you too, great

Read on to see what the problem was and how Richard solved it.

Comments closed

Working with Synapse Link for SQL

Steve Howard gives us an overview of a preview:

Azure Synapse Link for SQL greatly simplifies analytics pipelines as Microsoft manages the orchestration process for you. Since being announced at Microsoft Build, many of you have had the opportunity to try it out in a POC so now seems like a good time to take a deep dive on some implementation aspects that may save you time later.

This blog post will assume you have experience with Synapse dedicated SQL pools and that you have some basic working knowledge of Synapse Link for SQL from following the quick start or from doing an initial POC.

Looking at this, I am a bit concerned about what it means to sync actively changing tables, especially ones large enough to benefit from being in a dedicated SQL pool. “Just reload all the data” may be the right answer but it doesn’t sound like a convenient one.

Comments closed

Extracting Data from DAX Measures into CSV

Gilbert Quevauvilliers builds a process:

In this blog post I am going to demonstrate how to use the new Power Automate Flow to extract data from a DAX measure into a SharePoint CSV file.

I got this idea after reading the blog post from the Microsoft Power BI Team: Unlocking new self-service BI scenarios with ExecuteQueries support in Power Automate | Microsoft Power BI Blog | Microsoft Power BI

The great news is that this works on Power BI Pro, Premium Per User and Premium.

Read on to see how.

Comments closed

Testing Azure Synapse Link for SQL Server 2022

Kevin Chant gives Synapse Link for SQL Server a try:

Azure Synapse Link for SQL Server 2022 allows you to replicate your data from a SQL Server 2022 database to an Azure Synapse Analytics dedicated SQL Pool.

It is one of the options for the new Azure Synapse Link for SQL feature that was announced during Microsoft Build. You can read more about this in the Microsoft post which also announced the Public Preview of Azure Synapse Link for SQL.

Click through to see what Kevin has found so far. I think by the time this rolls out GA, it should be pretty good.

Comments closed

Processing Security Logs in Databricks with Delta Live Tables

Silvio Fiortio ingests some data:

Databricks recently introduced Workflows to enable data engineers, data scientists, and analysts to build reliable data, analytics, and ML workflows on any cloud without needing to manage complex infrastructure. Workflows allows users to build ETL pipelines that are automatically managed, including ingestion, and lineage, using Delta Live Tables. The benefits of Workflows and Delta Live Tables easily apply to security data sources, allowing us to scale to any volume or latency required for our operational needs.

In this article we’ll demonstrate some of the key benefits of Delta Live Tables for ingesting and processing security logs, with a few examples of common data sources we’ve seen our customers load into their cyber Lakehouse.

Click through to learn more.

Comments closed