Press "Enter" to skip to content

Recommendations for Implementing Azure Data Factory

Paul Andrew has a nice set of recommendations you should follow when configuring Azure Data Factory:

Building on our understanding of generic datasets, a good Data Factory should include (where possible) generic pipelines, these are driven from metadata to simplify (as a minimum) data ingestion operations. Typically I use an Azure SQLDB to house my metadata with stored procedures that get called via Lookup activities to return everything a pipeline needs to know.

This metadata driven approach means deployments to Data Factory for new data sources are greatly reduced and only adding new values to a database table is required. The pipeline itself doesn’t need to be complicated. Copying CSV files from a local file server to Data Lake Storage could be done with just three activities, shown below.

There are several good recommendations here; read the whole thing.