Paul Andrew has a process for us:
Back in May 2020 I wrote a blog post about ‘When You Should Use Multiple Azure Data Factory’s‘. Following on from this post with a full year+ now passed and having implemented many more data platform solutions for some crazy massive (technical term) enterprise customers I’ve been reflecting on these scenario’s. Specifically considering:
– The use of having multiple regional Data Factory instances and integration runtime services.
– The decoupling of wider orchestration processes from workers.
Furthermore, to supplement this understanding and for added context, in December 2020 I wrote about Data Factory Activity Concurrency Limits – What Happens Next? and Pipelines – Understanding Internal vs External Activities. Both of which now add to a much clearer picture regarding the ability to scale pipelines for the purposes of large-scale extraction and transformation processes.
Read on for details about the scenario, as well as a design pattern to explain the process. This is a large solution for a large-scale problem.