Press "Enter" to skip to content

Category: ETL / ELT

Automating Single Table Refresh with Azure Data Factory and Azure Automation

Marc Lelijveld wants to refresh a single table:

Back in February, I wrote a blog on how you can trigger a single table to refresh in your Power BI data model. This blog described how you can achieve this goal using a PowerShell script and the ASCmd cmdlets for Analysis Services, which also works for Power BI Premium. In the wrap-up of that blog, I promised to follow-up with a blog on how to achieve the same goal with Azure Data Factory. It took a little bit longer than expected to finalize this post, but here it is!

In this blog, co-authored by my colleague Paulien van Eijk, we will describe how you can automate your single table refresh in the Power BI Service, including all dependencies with downstream dataflows using Azure Data Factory and Azure Automation. All this is based on real life scenarios and a solution build in collaboration between Dave Ruijter, Paulien and me.

Read on for Marc and Paulien’s solution.

Comments closed

From Kafka to Azure Data Explorer

Niels Berglund uses Kafka Connect to link an Apache Kafka topic to Azure Data Explroer:

If you follow my blog, you probably know that I am a huge fan of Apache Kafka and event streaming/stream processing. Recently Azure Data Explorer (ADX) has caught my eye. In fact, in the last few weeks, I did two conference sessions about ADX. A month ago, I published a blog post related to Kafka and ADX: Run Self-Managed Kusto Kafka Connector Serverless in Azure Container Instances.

As the title of that post implies, it looked at the ADX Kafka sink connector and how to run it in Azure. What the post did not look at was how to configure the connector and connect it to ADX. That is what we will do in this post (and maybe in a couple of more posts).

This post serves as a complete tutorial, though Niels does promise future posts on other ingestion methods, so stay tuned.

Comments closed

From Azure Data Factory to Synapse Pipelines

Kevin Chant copies and pastes:

In this post I want to share an alternative way to copy an Azure Data Factory pipeline to Synapse Studio. Because I think it can be useful.

For those who are not aware, Synapse Studio is the frontend that comes with Azure Synapse Analytics. You can find out more about it in another post I did, which was a five minute crash course about Synapse Studio.

By the end of this post, you will know one way to copy objects used for an Azure Data factory pipeline to Synapse Studio. Which works as long as both are configured to use Git.

Click through to see how.

Comments closed

ETL via Powershell

Greg Moore builds a simple ETL process using Powershell:

Recently a customer asked me to work on a pretty typical project to build a process to import several CSV files into new tables in SQL Server. Setting up a PowerShell script to import the tables is a fairly simple process. However, it can be tedious, especially if the files have different formats. In this article, I will show you how building an ETL with PowerShell can save some time.

It’s a simple process, but that’s a good reminder that simple processes can be good processes.

Comments closed

Monitoring Azure Data Factory, Integration Runtimes, and Pipelines

Sandeep Arora monitors all the things:

For effective monitoring of ADF pipelines, we are going to use Log Analytics, Azure Monitor and Azure Data Factory Analytics. The above illustration shows the architectural representation of the monitoring setup.

The details of setting up log analytics, alerts and Azure Data Factory Analytics are further discussed in this section.

If you manage Azure Data Factory in your environment, give this a read.

Comments closed

Interchangability between ADF and Synapse Integration Pipelines

Paul Andrew makes a discovery:

Inspired by an earlier blog where we looked at ‘How Interchangeable Delta Tables Are Between Databricks and Synapse‘ I decided to do a similar exercise, but this time with the integration pipeline components taking centre stage.

As I said in my previous blog post, the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based data platform solution so to answer it directly:

Read on to learn the answer.

Comments closed

Moving Data from Confluent Cloud to Cosmos DB

Nathan Ham announces the Azure Cosmos DB sink connector in Confluent Cloud:

Today, Confluent is announcing the general availability (GA) of the fully managed Azure Cosmos DB Sink Connector within Confluent Cloud. Now, with just a few simple clicks, you can link the power of Apache Kafka® together with Azure Cosmos DB to set your data in motion.

Click through for a marketing-heavy look at how this works.

Comments closed

Exporting a Hive Table to CSV

The Hadoop in Real World team shows how you can export data from a Hive table specifically into a file using comma-separated values:

It is a pretty common use case to export the contents of a Hive table into a CSV file. It’s pretty simple if you are using a recent version of Hive. In this post, we will see who to achieve this with both newer and older versions of Hive.

Read on to see both versions of the answer.

Comments closed

Scaling ADF and Synapse Analytics Pipelines

Paul Andrew has a process for us:

Back in May 2020 I wrote a blog post about ‘When You Should Use Multiple Azure Data Factory’s‘. Following on from this post with a full year+ now passed and having implemented many more data platform solutions for some crazy massive (technical term) enterprise customers I’ve been reflecting on these scenario’s. Specifically considering:

– The use of having multiple regional Data Factory instances and integration runtime services.

– The decoupling of wider orchestration processes from workers.

Furthermore, to supplement this understanding and for added context, in December 2020 I wrote about Data Factory Activity Concurrency Limits – What Happens Next? and Pipelines – Understanding Internal vs External Activities. Both of which now add to a much clearer picture regarding the ability to scale pipelines for the purposes of large-scale extraction and transformation processes.

Read on for details about the scenario, as well as a design pattern to explain the process. This is a large solution for a large-scale problem.

Comments closed

Combining Change Data Capture with Azure Data Factory

Reitse Eskens continues a series on learning Azure Data Factory:

In my last blog, I pulled all the data from my table to my datalake storage. But, when data changes, I don’t want to perform a full load every time. Because it’s a lot of data, it takes time and somewhere down the line I’ll have to separate the changed rows from the identical ones. Instead of doing full loads every night or day or hour, I want to use a delta load. My pipeline should transfer only the new and changed rows. Very recently, Azure SQL DB finally added the option to enable Change Data Capture. This means after a full load, I can get the changed records only. And with changed records, it means the new ones, the updated ones and the deleted ones.

Let’s find out how that works.

Read on for the article and demonstration.

Comments closed