Press "Enter" to skip to content

Category: ETL / ELT

Cost Savings with Azure Data Factory

Koen Verbeeck maximizes the savings:

As you might’ve noticed, pricing in ADF is not the same as it was in SSIS for example. In SSIS, you pay your SQL Server license and you’re done (well, and you buy a server to run it on). It doesn’t matter what you do with SSIS, the cost is the same. If you run 1 package or 1000 packages, there’s no difference except in your electricity bill. However, in ADF you pay more if you use it more. You pay for each action you do, you pay for each activity you use and for how long things are running. There are a couple of guidelines you can follow to try to minimize costs:

Read on for those guidelines and some specific helpful items.

Comments closed

Databricks Workflows

Stacy Kerkela, et al, make an announcement:

Today we are excited to introduce Databricks Workflows, the fully-managed orchestration service that is deeply integrated with the Databricks Lakehouse Platform. Workflows enables data engineers, data scientists and analysts to build reliable data, analytics, and ML workflows on any cloud without needing to manage complex infrastructure. Finally, every user is empowered to deliver timely, accurate, and actionable insights for their business initiatives.

This looks a bit like Synapse pipelines. It’ll be interesting to see how this evovles.

Comments closed

Replacing Common Table Expressions in ADF Dataflows

Jeet Kainth needs an alternative:

At the time of writing, it is not possible to write a query using a CTE in the source of a dataflow. However, there are a few options to deal with this limitation:

– re-write the query using subqueries instead of CTEs

– use a stored procedure that contains the query and reference the stored proc in the source of the dataflow

– write the query as a view and reference the view in the source of the dataflow (this is my preferred method and the one I will demo here)

Jeet focuses on the third alternative. I’d lean toward the second or the third alternative, myself. Probably the second one (stored procedures) but both allow me to create an interface between ADF and the database. That way, underlying table changes will be less likely to require me to make code changes in ADF.

Comments closed

Costs for Managed Virtual Networks in Azure Data Factory

Martin Schoombee brings up an interesting point:

We were running SSIS in an Azure VM, spinning the VM up and down as required to run the ETL processes. A third-party SSIS component was used to extract data out of Dynamics 365 CRM, and accounted for a significant part of the yearly costs. I blogged about the reasons why I think it’s worth moving from Azure AS to Power BI PPU before, and combined with the move to Azure Data Factory I estimated a cost reduction of almost 35%.

After deploying the solution I noticed that our daily ETL costs were significantly higher than I thought it would be, and that started a little rabbit-hole exercise to figure out why.

I’m used to thinking about managed virtual networks in the case of Azure Synapse Analytics, where I think it makes a lot of sense as a default (especially because you can’t switch after you’ve made a decision).

Comments closed

Data Ingestion and Exploration in Azure Synapse Analytics

Cathrine Wilhelmsen gets some Lego data:

In the April session of the Azure Synapse Analytics and Microsoft MVP series, I got to show one of my favorite demos: ingesting and exploring LEGO datasets! 🤓 In just a few minutes, I grab a compressed file from the web, decompress it, store it as a Parquet file in my data lake, and run SQL queries on it using Serverless SQL. It really is that simple:

Click through for a video and a step-by-step walkthrough.

Comments closed

Restarting Azure Data Factory Triggers

Andy Leonard provides an after-action report:

During delivery of the class, I popped over to a much older data factory and fired up a couple integration runtimes (IRs). You see, on this older data factory, I trigger a couple pipelines that check to see if I’ve left an IR running. If so, each pipeline will shut down its respective IR. The trigger fires each evening. I blogged about the pipeline design almost two years ago in a post titled  Stop an Azure-SSIS Files Integration Runtime (Safely).

Read on for the full report, some takeaways on how to limit the risk, and possible next steps if you find yourself in a situation like Andy did.

Comments closed

Migrating SSIS On-Prem Workloads into Azure

Jitendra Yadeo has put together a how-to guide:

– There can be scenario where organization wants to migrate there existing SSIS ETL process on cloud so instead of rewriting SSIS package using Cloud specific ETL tools like Azure Data Factory we can directly migrate SSIS packages and call it through Azure Data Factory.

– Goal of this blog is to show how SSIS packages hosted on on-premise can be migrated to Azure Data Factory (ADF) using Azure-SSIS Integration Runtime (IR).

Read on for a step-by-step guide.

Comments closed

Running SQL Scripts on Snowflake from Azure Data Factory

Koen Verbeeck shows off the Script activity in Azure Data Factory:

Azure Data Factory has a new activity introduced this week (around the 10th of March 2022 for you future readers): the Script activity! This is not to be confused with the script task/component of SSIS, which allows you to execute .NET script (C# for most people, or VB if you’re Ben Weissman). No, this task executes SQL, so it’s more akin to the Execute SQL Task of SSIS.

Click through to see how it works while I lament the fact that SSIS never supported the best .NET language.

Comments closed

Stringing Azure Data Factory between VNets

Ahmed Mahmoud performs networking wizardry:

Customer wants to connect Azure Data Factory on one subscription to an Azure SQL Server on Virtual Machine (SQL VM) on another subscription. check out the architecture diagram below for more clarification.

Click through for that diagram as well as the process. And between VNet peering and Private Link, I believe (but could be wrong in saying) the traffic would never leave Azure-hosted machines even when it transits between subscriptions.

Comments closed