ETL / ELT – Page 13 – Curated SQL

Data Pipelines and Data Mesh

Published 2023-03-16 by Kevin Feasel

Jean-Georges Perrin answers a burning question:

I keep having questions about data pipelines. Data pipelines in Data Mesh is a topic I should tackle. So… Is the data pipeline the root of all evil?

Jean-Georges’s answer is quite in line with one of my favorite phrases: “Short answer: no, with an ‘if’; long answer: yes, with a ‘but.'” Read on for some thoughts on data pipelines and what the data mesh concept does to minimize harm.

Comments closed

Trying out Azure Synapse Link for SQL Server 2022

Published 2023-02-14 by Kevin Feasel

Kevin Chant looks at Azure Synapse Link for SQL Server 2022:

My first topic is about a new feature that covers both SQL Server 2022 and Azure. Which is Azure Synapse Link, or to be more precise Azure Synapse Link for SQL Server 2022.

I have been doing various tests with this feature recently. Which has led to some interesting blog posts about Azure Synapse Link for SQL Server 2022.

Read on for a few more thoughts, as well as deployment scripts via Azure DevOps and GitHub Actions.

Comments closed

Orchestrating Synapse Notebooks and Spark Jobs from ADF

Published 2023-01-30 by Kevin Feasel

Abhishek Narain has an announcement:

Today, we are introducing support for orchestrating Synapse notebooks and Synapse spark job definitions (SJD) natively from Azure Data Factory pipelines. It immensely helps customers who have invested in ADF and Synapse Spark without requiring to switch to Synapse Pipelines for orchestrating Synapse Notebooks and SJD.

NOTE: Synapse notebook and SJD activities were only available in Synapse Pipelines previously.

If you’re familiar with Synapse Pipelines, the equivalent ADF operations are extremely similar, as you’d probably expect.

Comments closed

Limiting Data Factory Users to Trigger Pipelines

Published 2023-01-27 by Kevin Feasel

Koen Verbeeck doesn’t want people running amok:

Typically you have a bunch of pipelines that are started by one or more triggers. Sometimes, a pipeline needs to be manually triggered. For example, when the finance department is closing the fiscal year, they probably want to run the ETL pipeline a couple of times on-demand, to make sure their latest changes are reflected in the reports. Since you don’t want them to contact you every time to start a pipeline, it might be an idea to give them permission to start the pipeline themselves.

This can obviously be done by tools such as Azure Logic Apps or a Power App, but in my case the users also wanted to view the progress of the pipeline (did something crash? Why is it taking so long? etc.) and developing a Power App with all those features seemed a bit cumbersome to me. Instead, we gave them permission on ADF itself so they can start the pipelines. There’s one problem though, there’s only one role for ADF in Azure, and it’s the contributor role. A bit too much permission, as anyone with that role can change anything in ADF. You don’t want that.

So what can you do? Click through to find out.

Comments closed

Error Handling Patterns in ADF Pipelines

Published 2023-01-19 by Kevin Feasel

Chenye Charlie Zhu begins a new series:

Orchestration allows conditional logic and enables user to take different based upon outcomes of a previous activity. Building upon the concepts of conditional paths, ADF and Synapse pipeline allows users to build versatile and resilient work flows that can handle unexpected errors that work smoothly in auto-pilot mode.

This is an ongoing series that gradually level up and help you build even more complicated logic to handle more scenarios. We will walk through examples for some common use cases, and help you to build functional and useful work flows.

Read on for a few error-handling patterns.

Comments closed

Logic App Errors with Variables in Sharepoint Actions

Published 2023-01-19 by Kevin Feasel

Koen Verbeeck troubleshoots an issue:

I have a Logic App that reads out a SharePoint library and stores all the documents found into Azure Blob Storage (ADF only supports Lists). I was trying to make this Logic App “generic”, meaning I could change the source folder and the destination container by using variables. That way, I have one single Logic App which can read out any SharePoint library, instead of creating a new Logic App for each library.

So I adapted my HTTP trigger to accept a JSON payload, which contains the name of the folder on SharePoint and the name of the blob container.

Read on to see the error message, as well as how Koen resolved the problem.

Comments closed

Cosmos DB to Data Explorer Synapse Link

Published 2023-01-05 by Kevin Feasel

Vicent-Philippe Lauzon makes an announcement:

We recently made our new Kusto data connection available in public preview: Cosmos DB to Azure Data Explorer Synapse Link.

This does look like a marketing-heavy announcement but the short version is that you can ingest data from Cosmos DB into Data Explorer pools via Synapse Link rather than creating your own ETL process. The previous Cosmos DB connector for Synapse Link tied to a dedicated SQL pool.

Comments closed

Row-Level Security and Data Migration

Published 2022-12-27 by Kevin Feasel

Forrest McDaniel shares an interesting case of using row-level security:

This was the situation I found myself in earlier this year – our company had absorbed another, and it was time to slurp up their tables. There were a lot of decisions to make and tradeoffs to weigh, and we ended up choosing to trickle-insert their data, but make it invisible to normal use until the moment of cutover.

The way we implemented this was with Row Level Security. Using an appropriate predicate, we could make sure ETL processes only saw migrated data, apps saw unmigrated data, and admins saw everything. To give a spoiler: it worked, but there were issues.

I would not have thought of this scenario. And given the difficulties Forrest & crew ran into, it might be for the best…

Comments closed

Script Activity Outputs to ForEach Inputs with ADF

Published 2022-12-23 by Kevin Feasel

Meagan Longoria links in a script:

In early 2022, Microsoft released a new activity in Azure Data Factory (ADF) called the Script activity. The Script activity allows you to execute one or more SQL statements and receive zero, one, or multiple result sets as the output. This is an advantage over the stored procedure activity that was already available in ADF, as the stored procedure activity doesn’t support using the result set returned from a query in a downstream activity.

However, when I went to find examples of how to reference those result sets, the documentation was lacking.

Click through as Meagan corrects a gap in documentation.

Comments closed

Reading Serverless SQL Pool Data with Data Factory

Published 2022-11-28 by Kevin Feasel

Koen Verbeeck wants to read from the serverless SQL pool in Azure Synapse Analytics:

We have some data we can query using the serverless SQL pools in Azure Synapse Analytics. For this blog post, I’m querying data that is stored in Azure Cosmos DB. Read the blog post How to Store Normalized SQL Server Data into Azure Cosmos DB to learn more about how that data got there.

Suppose I now want to read the data using Azure Data Factory. You can read data from Cosmos DB directly, but let’s pretend I want to do some transformations first using my favorite language: SQL. How can we do this?

Read on to learn how.

Comments closed

Category: ETL / ELT