ETL / ELT – Page 31 – Curated SQL

Breaking out of Azure Data Factory ForEach Activities

Published 2020-06-19 by Kevin Feasel

“What if something fails inside the ForEach activity’s inner activities, Andy?”
That is an excellent question! I’m glad you asked. The answer is: The ForEach activity continues iterating.
I can now hear some of you asking…
“What if I want the ForEach activity to fail when an inner activity fails, Andy?”
Another excellent question, and you’ve found a post discussing one way to “break out” of a ForEach activity’s iteration.

Read on for the process.

Comments closed

Securing the Data Prep Area

Published 2020-06-19 by Kevin Feasel

Tim Mitchell explains why you should limit access to your staging area:

First things first, let’s define what a data prep area is. Data preparation (prep) is a common phase of extract, transform, and load (ETL) operations in which data is temporarily written for cleansing, deduplication, reshaping, or other data modifications. Also sometimes referred to as a landing area or a staging area, this is a common design pattern when moving data from a data store optimized for online transaction processing (OLTP) to a data model more friendly to analytics or reporting.
The data prep area really is a lot like a restaurant kitchen: it’s sometimes chaotic, it’s not consumer friendly, and there is a legitimate risk of consuming half-prepared goods.

Tim lays out why that is, so check it out.

Comments closed

Azure Data Factory Support for Excel

Published 2020-06-15 by Kevin Feasel

Andy Leonard is excited:

I am convinced they got tired of me saying, “Well, you’ll need SSIS if you want to read Excel…” Probably not, though. Either way… score! :{>

Read on to see how easy it is to select Excel as a file format for your Blob Storage data.

Comments closed

Including Headers in Zero-Row ADF Data Flows

Published 2020-06-11 by Kevin Feasel

Mark Kromer meets a challenge:

Today, we don’t have an option in data flows in ADF to include headers when your transformations result in zero rows. But you can build the logic to handle this scenario. So, until we add a checkbox feature to include headers, you can use this technique below to achieve this.

Click through for the explanation, as well as a completed version you can take for your own.

Comments closed

E-Mail Alerting in ADF.procfwk

Published 2020-06-09 by Kevin Feasel

Paul Andrew has an update to the Azure Data Factory Procedural Framework:

The primary goal of this release was to implement email alerting within the existing processing framework and using existing metadata driven practices to deliver this in an easy to control, flexible and granular way. That said, the following statements have been met in terms of alerting capabilities and design.

Read on for the full change list.

Comments closed

Unit Testing Azure Data Factory Pipelines

Published 2020-06-05 by Kevin Feasel

Richard Swinbank walks us through what it takes to run a unit test against an Azure Data Factory pipeline:

In part three of this series I looked at functional tests for ADF pipelines: verifying, in isolation, that pipelines are “doing things right”. In this post I’ll be testing isolated pipelines to check that they’re “doing the right things” – this is one description of a unit test. In a general-purpose programming language, unit tests might be used to verify that an individual line of code is executed, or that it has a particular effect. In Azure Data Factory, the smallest unit of development – a “line of code” – is a pipeline activity. I will be writing tests to verify that specific activities are executed (or not) and to inspect their results.

There’s a fair bit involved in this sort of test.

Comments closed

Bulk Loading SQL Server from .NET

Published 2020-06-05 by Kevin Feasel

Adrian Hills walks us through the SqlBulkCopy class:

Ever been in a situation where rumblings of “process X is too slow” suddenly build into a super-high priority ball of urgency when that next step up in data volume hits? Yeah, that can be fun. No, really, it can be fun because we have strategies to sort this stuff out, right?
In this blog post, I’m going to talk about one particular piece of functionality—SqlBulkCopy—that can help you with bulk data loading. If I had to single out my favorite .NET class, SqlBulkCopy would be at the top of the list. My goal is to introduce you to this class so that maybe it can become a part of your tool belt, too.

Click through to see how it works. If you’re familiar with SSIS, you’re already familiar with the concept if not the specifics.

Comments closed

Running Databricks ETL Code from Azure Data Factory

Published 2020-06-05 by Kevin Feasel

Ed Elliott shares several approaches for executing Databricks code from Azure Data Factory:

.Net
No post I write about Spark is going to miss off .Net, if you are writing C# or F# then you can’t use a notebook anyway. If you do this then you will also include the .Net driver JAR file (weird statement to make) and call that.

Read on for all of the techniques.

Comments closed

Feeding Databricks Output to Azure SQL Database

Published 2020-05-29 by Kevin Feasel

Arun Sirpal takes us through the process of moving data from Databricks into Azure SQL Database:

Recently I got to a stage where I leveraged Databricks to the best of my ability to join couple of CSV files together, play around some aggregations and then output it back to a different mount point ( based on Azure Storage) as a parquet file, I decided that I actually wanted to move this data into Azure SQL DB, which you may want to do one day.

This isn’t just dropping files into Blob Storage and picking them up, but rather a direct integration.

Comments closed

When to Have Multiple Azure Data Factories

Published 2020-05-26 by Kevin Feasel

Paul Andrew explains how to become a factory mogul:

The obvious and easy reason for having multiple Data Factory’s could be that you simply want to separate your business processes. Maybe they all have separate data delivery requirements and it just makes management of data flows easier to handle. For example:
– Sales
– Finance
– HR
They could have different data delivery deadlines, they process on different schedules and don’t share any underlying connections.
You may also have multiple projects underway that mean you want to keep teams isolated.

But that’s not the only reason, so click through to learn several other reasons why you might have multiple Azure Data Factory instances running.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: ETL / ELT