ETL / ELT – Page 18 – Curated SQL

Replacing Common Table Expressions in ADF Dataflows

Published 2022-05-16 by Kevin Feasel

At the time of writing, it is not possible to write a query using a CTE in the source of a dataflow. However, there are a few options to deal with this limitation:
– re-write the query using subqueries instead of CTEs
– use a stored procedure that contains the query and reference the stored proc in the source of the dataflow
– write the query as a view and reference the view in the source of the dataflow (this is my preferred method and the one I will demo here)

Jeet focuses on the third alternative. I’d lean toward the second or the third alternative, myself. Probably the second one (stored procedures) but both allow me to create an interface between ADF and the database. That way, underlying table changes will be less likely to require me to make code changes in ADF.

Comments closed

A Workaround to Parameterize Synapse Execute Pipeline Activities

Published 2022-04-14 by Kevin Feasel

Nick Edwards hits us with bad news:

Unfortunately, as of April 2022 there is not an option to parameterise or add dynamic content to an “Execute Pipeline” activity to invoke a pipeline run.
However, with the use of a Microsoft API there is method which we can use to overcome this.

Click through to learn more about this API call and see an example of it in action.

Comments closed

Costs for Managed Virtual Networks in Azure Data Factory

Published 2022-04-13 by Kevin Feasel

Martin Schoombee brings up an interesting point:

We were running SSIS in an Azure VM, spinning the VM up and down as required to run the ETL processes. A third-party SSIS component was used to extract data out of Dynamics 365 CRM, and accounted for a significant part of the yearly costs. I blogged about the reasons why I think it’s worth moving from Azure AS to Power BI PPU before, and combined with the move to Azure Data Factory I estimated a cost reduction of almost 35%.
After deploying the solution I noticed that our daily ETL costs were significantly higher than I thought it would be, and that started a little rabbit-hole exercise to figure out why.

I’m used to thinking about managed virtual networks in the case of Azure Synapse Analytics, where I think it makes a lot of sense as a default (especially because you can’t switch after you’ve made a decision).

Comments closed

Data Ingestion and Exploration in Azure Synapse Analytics

Published 2022-04-13 by Kevin Feasel

Cathrine Wilhelmsen gets some Lego data:

In the April session of the Azure Synapse Analytics and Microsoft MVP series, I got to show one of my favorite demos: ingesting and exploring LEGO datasets! 🤓 In just a few minutes, I grab a compressed file from the web, decompress it, store it as a Parquet file in my data lake, and run SQL queries on it using Serverless SQL. It really is that simple:

Click through for a video and a step-by-step walkthrough.

Comments closed

Restarting Azure Data Factory Triggers

Published 2022-04-12 by Kevin Feasel

Andy Leonard provides an after-action report:

During delivery of the class, I popped over to a much older data factory and fired up a couple integration runtimes (IRs). You see, on this older data factory, I trigger a couple pipelines that check to see if I’ve left an IR running. If so, each pipeline will shut down its respective IR. The trigger fires each evening. I blogged about the pipeline design almost two years ago in a post titled Stop an Azure-SSIS Files Integration Runtime (Safely).

Read on for the full report, some takeaways on how to limit the risk, and possible next steps if you find yourself in a situation like Andy did.

Comments closed

Migrating SSIS On-Prem Workloads into Azure

Published 2022-03-30 by Kevin Feasel

Jitendra Yadeo has put together a how-to guide:

– There can be scenario where organization wants to migrate there existing SSIS ETL process on cloud so instead of rewriting SSIS package using Cloud specific ETL tools like Azure Data Factory we can directly migrate SSIS packages and call it through Azure Data Factory.
– Goal of this blog is to show how SSIS packages hosted on on-premise can be migrated to Azure Data Factory (ADF) using Azure-SSIS Integration Runtime (IR).

Read on for a step-by-step guide.

Comments closed

Running SQL Scripts on Snowflake from Azure Data Factory

Published 2022-03-22 by Kevin Feasel

Koen Verbeeck shows off the Script activity in Azure Data Factory:

Azure Data Factory has a new activity introduced this week (around the 10th of March 2022 for you future readers): the Script activity! This is not to be confused with the script task/component of SSIS, which allows you to execute .NET script (C# for most people, or VB if you’re Ben Weissman). No, this task executes SQL, so it’s more akin to the Execute SQL Task of SSIS.

Click through to see how it works while I lament the fact that SSIS never supported the best .NET language.

Comments closed

Stringing Azure Data Factory between VNets

Published 2022-03-08 by Kevin Feasel

Ahmed Mahmoud performs networking wizardry:

Customer wants to connect Azure Data Factory on one subscription to an Azure SQL Server on Virtual Machine (SQL VM) on another subscription. check out the architecture diagram below for more clarification.

Click through for that diagram as well as the process. And between VNet peering and Private Link, I believe (but could be wrong in saying) the traffic would never leave Azure-hosted machines even when it transits between subscriptions.

Comments closed

Executing SQL Statements in Azure Data Factory

Published 2022-03-07 by Kevin Feasel

Abhishek Narain announces a pretty nice improvement to Azure Data Factory and Synapse Pipelines:

We are introducing a Script activity in pipelines that provide the ability to execute single or multiple SQL statements.
Using the script activity, you can execute common operations with Data Manipulation Language (DML), and Data Definition Language (DDL). DML statements like SELECT, UPDATE, and INSERT let users retrieve, store, modify, delete, insert and update data in the database. DDL statements like CREATE, ALTER, and DROP allow a database manager to create, modify, and remove database objects such as tables, indexes, and users.

Be sure to read the limitations at the bottom, however.

Comments closed

Alternatives to the Execute SQL Task in Azure Data Factory

Published 2022-02-28 by Kevin Feasel

Jeet Kainth wants an Execute SQL Task:

If you’re new to ADF or Synapse pipelines and looking for a way to execute SQL commands in pipelines, unfortunately there is no equivalent to the Execute SQL Task in SSIS, but I have found some alternatives which work well for me.

Click through for those alternatives.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Category: ETL / ELT