Press "Enter" to skip to content

Metadata-Driven ADF Pipelines

Paul Andrew wraps up a series on metadata-driven processing of Azure Data Factory pipelines. Part 3 covers the ADF wrapper necessary for our custom pipelines:

Firstly, to help guide this post below is a mock up of our Data Factory pipelines and activities to show the end goal. Hopefully this view informs how things are going to be connected using what I call a pipeline hierarchy system and how they will work in the overall framework. For our metadata processing framework we can make the following category distinctions about the activities represented:

Grandparent – This is the top level orchestration of our wider data platform solution. Here a scheduled trigger could be connected or processing in our solution grouped into natural areas. Technically this level isn’t required for our processing framework, but I’ve included it as good practice.
Parent – Our parent pipelines primary purpose (try saying that fast 3 times 🙂 ) is to handle the stages of our processing framework. The stages will then be passed off sequentially to our child pipeline using another execute pipeline activity.
Child – At this level in the framework the child is hitting the Azure Function to call the lowest level executors, or the pipelines that we want to actually do the work in our data platform solution. In my previous post I added some example metadata to call pipelines name Stage X-X. These in turn relate to a set of empty place holder pipelines that contained only Wait activities.

Part 4 puts it all together:

For the end to end run of the framework we have a few options to see progress once its been triggered. Before that its worth pointing out that in the below I’ve used the sample metadata provided with the database scripts in GitHub. Then for each execution pipeline I’ve added a single Wait activity with a random time delay of a few seconds. The point here is to test the framework execution, not the pipelines being called. To further clarify, the Stage X-X pipelines names should be replaced with your actual pipeline names in your data platform solution.

Definitely worth the read.