Apache Airflow is a widely used task orchestration framework, which gained its popularity due to Python-based programmatic interface – the language of first choice by Data engineers and Data ops. The framework allows defining complex pipelines that move data around different parts, potentially implemented using different technologies.
The following article shows how to setup managed instance of Apache Airflow and define a very simple DAG (direct acyclic graph) of tasks that does the following:
- Uses Azure registered application to authenticate with the ADX cluster.
- Schedules daily execution of a simple KQL query that calculates HTTP errors statistics based on Web log records for the last day.
Click through for the process.