Press "Enter" to skip to content

Azure DevOps and Data Factory

Helge Rege Gardsvoll has a three-part series for us on using Azure DevOps to deploy Data Factories. Part 1 is all about environment setup:

Shared Data Factory
The shared Data Factory is there for one use; self-hosted integration runtimes. This is the component you will use to connect to on-premise or other sources that have restrictions on access such as IP restriction or other firewall rules. Migrating a Self Hosted Integration Runtime is not supported, but you can share the same Integration Runtime across different Data Factories. You can find a description for how to do this in this article.

Part 2 covers Git branching, linked services, and development:

Create datasets and pipeline
For this demo I create two datasets; one for source and one for target, and a simple pipeline that copies the data. Datasets have name that point to the data lake, like ADLS_datahelgeadls2_Brreg_MainUnits, but does not include environment information.

Part 3 covers the release process:

The release process will have these steps;
1. Stop any active triggers. We do not want any pipelines to start as we are changing things (and you should wait until running pipelines finish before publishing)
2. Release from development to target environment
3. Clean up target environment by removing objects that are not present in dev. Also start triggers

This is a great series of posts and also includes a bonus tidbit if you’re using Databricks.