ETL / ELT – Page 13 – Curated SQL

Adding Microseconds to ADF Timestamps

Published 2023-04-27 by Kevin Feasel

Rayis Imayev can’t wait for the next second:

The current addToTime function (https://learn.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions#addToTime) in Azure Data Factory (ADF) only supports a specific set of time units ranging from Year to Seconds. Since I needed to increment a timestamp by microseconds, I had to find an alternative solution in ADF. Here are my findings on how to do this using an alternate approach.

Click through for Rayis’s solution to the problem.

Comments closed

Unrolling Multiple Arrays in Azure Data Factory

Published 2023-04-24 by Kevin Feasel

Mark Kromer puts us in disarray:

ADF and Synapse data flows gave a Flatten transformation to make it easy to unroll an array as part of your data transformation pipelines. We’ve updated the Flatten transformation to now allow for multiple arrays that can be unrolled in a single transformation step. This will make your ETL jobs much simpler with fewer transformation steps.

Click through for screenshots showing how to use this feature.

Comments closed

Spark ELT in Synapse Notebooks

Published 2023-04-10 by Kevin Feasel

Liliam Leme performs some data movement:

I often receive various requests from customers while working on FastTrack projects, and I have compiled some examples to help you build your solution on top of a data lake using useful tips. Most of the examples in this post use pandas, and I hope they will be helpful for you as they were for me.

Please note that all examples in this post use pyspark.

In my scenario, I exported multiple tables from SQLDB to a folder using a notebook and ran the requests in parallel.

Read on for the examples and some of the things you can do with Spark notebooks in Azure Synapse Analytics.

Comments closed

Automating Self-Hosted Integration Runtime Deployment

Published 2023-04-05 by Kevin Feasel

Jonathan D’Aloia doesn’t want to click next-next-next:

Welcome to my blog on how to fully automate the deployment of a Self-Hosted Integration Runtime using Terraform!

The title of this blog is very much self-explanatory but I hope you find the contents useful and are able to apply this on your projects in some aspect.

Click through for a brief overview of self-hosted integration runtimes, the process to follow, and a link to the repo.

Comments closed

Data Pipelines and Data Mesh

Published 2023-03-16 by Kevin Feasel

Jean-Georges Perrin answers a burning question:

I keep having questions about data pipelines. Data pipelines in Data Mesh is a topic I should tackle. So… Is the data pipeline the root of all evil?

Jean-Georges’s answer is quite in line with one of my favorite phrases: “Short answer: no, with an ‘if’; long answer: yes, with a ‘but.'” Read on for some thoughts on data pipelines and what the data mesh concept does to minimize harm.

Comments closed

Trying out Azure Synapse Link for SQL Server 2022

Published 2023-02-14 by Kevin Feasel

Kevin Chant looks at Azure Synapse Link for SQL Server 2022:

My first topic is about a new feature that covers both SQL Server 2022 and Azure. Which is Azure Synapse Link, or to be more precise Azure Synapse Link for SQL Server 2022.

I have been doing various tests with this feature recently. Which has led to some interesting blog posts about Azure Synapse Link for SQL Server 2022.

Read on for a few more thoughts, as well as deployment scripts via Azure DevOps and GitHub Actions.

Comments closed

Orchestrating Synapse Notebooks and Spark Jobs from ADF

Published 2023-01-30 by Kevin Feasel

Abhishek Narain has an announcement:

Today, we are introducing support for orchestrating Synapse notebooks and Synapse spark job definitions (SJD) natively from Azure Data Factory pipelines. It immensely helps customers who have invested in ADF and Synapse Spark without requiring to switch to Synapse Pipelines for orchestrating Synapse Notebooks and SJD.

NOTE: Synapse notebook and SJD activities were only available in Synapse Pipelines previously.

If you’re familiar with Synapse Pipelines, the equivalent ADF operations are extremely similar, as you’d probably expect.

Comments closed

Limiting Data Factory Users to Trigger Pipelines

Published 2023-01-27 by Kevin Feasel

Koen Verbeeck doesn’t want people running amok:

Typically you have a bunch of pipelines that are started by one or more triggers. Sometimes, a pipeline needs to be manually triggered. For example, when the finance department is closing the fiscal year, they probably want to run the ETL pipeline a couple of times on-demand, to make sure their latest changes are reflected in the reports. Since you don’t want them to contact you every time to start a pipeline, it might be an idea to give them permission to start the pipeline themselves.

This can obviously be done by tools such as Azure Logic Apps or a Power App, but in my case the users also wanted to view the progress of the pipeline (did something crash? Why is it taking so long? etc.) and developing a Power App with all those features seemed a bit cumbersome to me. Instead, we gave them permission on ADF itself so they can start the pipelines. There’s one problem though, there’s only one role for ADF in Azure, and it’s the contributor role. A bit too much permission, as anyone with that role can change anything in ADF. You don’t want that.

So what can you do? Click through to find out.

Comments closed

Error Handling Patterns in ADF Pipelines

Published 2023-01-19 by Kevin Feasel

Chenye Charlie Zhu begins a new series:

Orchestration allows conditional logic and enables user to take different based upon outcomes of a previous activity. Building upon the concepts of conditional paths, ADF and Synapse pipeline allows users to build versatile and resilient work flows that can handle unexpected errors that work smoothly in auto-pilot mode.

This is an ongoing series that gradually level up and help you build even more complicated logic to handle more scenarios. We will walk through examples for some common use cases, and help you to build functional and useful work flows.

Read on for a few error-handling patterns.

Comments closed

Logic App Errors with Variables in Sharepoint Actions

Published 2023-01-19 by Kevin Feasel

Koen Verbeeck troubleshoots an issue:

I have a Logic App that reads out a SharePoint library and stores all the documents found into Azure Blob Storage (ADF only supports Lists). I was trying to make this Logic App “generic”, meaning I could change the source folder and the destination container by using variables. That way, I have one single Logic App which can read out any SharePoint library, instead of creating a new Logic App for each library.

So I adapted my HTTP trigger to accept a JSON payload, which contains the name of the folder on SharePoint and the name of the blob container.

Read on to see the error message, as well as how Koen resolved the problem.

Comments closed

Category: ETL / ELT