ETL / ELT – Page 23 – Curated SQL

Updates to Azure Synapse Link

Published 2021-11-12 by Kevin Feasel

Aria Jelinek outlines the value of Azure Synapse Link:

New as of Ignite 2021, customers can optimize queries by setting custom partitions for their Azure Cosmos DB analytical store using keys that are commonly used as query filters. This compacts and optimizes the analytical data written to the partitioned store, resulting in better query performance even when working with a high volume of update or delete operations.
Azure Synapse Link is also now available for Azure Cosmos DB serverless accounts, expanding the integration to cover data from workloads with bursts of traffic or uncertain traffic patterns.

This post mostly covers the Dataverse and Cosmos DB integrations rather than the integration with SQL Server 2022.

One the whole, I like Azure Synapse Link for Cosmos DB and will probably like it for SQL Server 2022—maybe even a bit more. It does simplify the ELT process by taking care of the E and handling the first half of the L (landing into a staging table). Though if data’s going into a dedicated SQL pool, I do hope the people doing this understand that dedicated SQL pools are intended for Kimball-style data warehousing scenarios and there can be a considerable performance (and therefore price) hit if you simply replicate a bunch of stuff without subsequent transformation.

Comments closed

Automating Single Table Refresh with Azure Data Factory and Azure Automation

Published 2021-10-29 by Kevin Feasel

Marc Lelijveld wants to refresh a single table:

Back in February, I wrote a blog on how you can trigger a single table to refresh in your Power BI data model. This blog described how you can achieve this goal using a PowerShell script and the ASCmd cmdlets for Analysis Services, which also works for Power BI Premium. In the wrap-up of that blog, I promised to follow-up with a blog on how to achieve the same goal with Azure Data Factory. It took a little bit longer than expected to finalize this post, but here it is!
In this blog, co-authored by my colleague Paulien van Eijk, we will describe how you can automate your single table refresh in the Power BI Service, including all dependencies with downstream dataflows using Azure Data Factory and Azure Automation. All this is based on real life scenarios and a solution build in collaboration between Dave Ruijter, Paulien and me.

Read on for Marc and Paulien’s solution.

Comments closed

From Kafka to Azure Data Explorer

Published 2021-10-29 by Kevin Feasel

Niels Berglund uses Kafka Connect to link an Apache Kafka topic to Azure Data Explroer:

If you follow my blog, you probably know that I am a huge fan of Apache Kafka and event streaming/stream processing. Recently Azure Data Explorer (ADX) has caught my eye. In fact, in the last few weeks, I did two conference sessions about ADX. A month ago, I published a blog post related to Kafka and ADX: Run Self-Managed Kusto Kafka Connector Serverless in Azure Container Instances.
As the title of that post implies, it looked at the ADX Kafka sink connector and how to run it in Azure. What the post did not look at was how to configure the connector and connect it to ADX. That is what we will do in this post (and maybe in a couple of more posts).

This post serves as a complete tutorial, though Niels does promise future posts on other ingestion methods, so stay tuned.

Comments closed

From Azure Data Factory to Synapse Pipelines

Published 2021-09-24 by Kevin Feasel

Kevin Chant copies and pastes:

In this post I want to share an alternative way to copy an Azure Data Factory pipeline to Synapse Studio. Because I think it can be useful.
For those who are not aware, Synapse Studio is the frontend that comes with Azure Synapse Analytics. You can find out more about it in another post I did, which was a five minute crash course about Synapse Studio.
By the end of this post, you will know one way to copy objects used for an Azure Data factory pipeline to Synapse Studio. Which works as long as both are configured to use Git.

Click through to see how.

Comments closed

ETL via Powershell

Published 2021-09-23 by Kevin Feasel

Greg Moore builds a simple ETL process using Powershell:

Recently a customer asked me to work on a pretty typical project to build a process to import several CSV files into new tables in SQL Server. Setting up a PowerShell script to import the tables is a fairly simple process. However, it can be tedious, especially if the files have different formats. In this article, I will show you how building an ETL with PowerShell can save some time.

It’s a simple process, but that’s a good reminder that simple processes can be good processes.

Comments closed

Monitoring Azure Data Factory, Integration Runtimes, and Pipelines

Published 2021-09-16 by Kevin Feasel

Sandeep Arora monitors all the things:

For effective monitoring of ADF pipelines, we are going to use Log Analytics, Azure Monitor and Azure Data Factory Analytics. The above illustration shows the architectural representation of the monitoring setup.
The details of setting up log analytics, alerts and Azure Data Factory Analytics are further discussed in this section.

If you manage Azure Data Factory in your environment, give this a read.

Comments closed

Interchangability between ADF and Synapse Integration Pipelines

Published 2021-09-09 by Kevin Feasel

Paul Andrew makes a discovery:

Inspired by an earlier blog where we looked at ‘How Interchangeable Delta Tables Are Between Databricks and Synapse‘ I decided to do a similar exercise, but this time with the integration pipeline components taking centre stage.
As I said in my previous blog post, the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based data platform solution so to answer it directly:

Read on to learn the answer.

Comments closed

Moving Data from Confluent Cloud to Cosmos DB

Published 2021-08-13 by Kevin Feasel

Nathan Ham announces the Azure Cosmos DB sink connector in Confluent Cloud:

Today, Confluent is announcing the general availability (GA) of the fully managed Azure Cosmos DB Sink Connector within Confluent Cloud. Now, with just a few simple clicks, you can link the power of Apache Kafka^® together with Azure Cosmos DB to set your data in motion.

Click through for a marketing-heavy look at how this works.

Comments closed

Exporting a Hive Table to CSV

Published 2021-08-12 by Kevin Feasel

The Hadoop in Real World team shows how you can export data from a Hive table specifically into a file using comma-separated values:

It is a pretty common use case to export the contents of a Hive table into a CSV file. It’s pretty simple if you are using a recent version of Hive. In this post, we will see who to achieve this with both newer and older versions of Hive.

Read on to see both versions of the answer.

Comments closed

Scaling ADF and Synapse Analytics Pipelines

Published 2021-08-10 by Kevin Feasel

Paul Andrew has a process for us:

Back in May 2020 I wrote a blog post about ‘When You Should Use Multiple Azure Data Factory’s‘. Following on from this post with a full year+ now passed and having implemented many more data platform solutions for some crazy massive (technical term) enterprise customers I’ve been reflecting on these scenario’s. Specifically considering:
– The use of having multiple regional Data Factory instances and integration runtime services.
– The decoupling of wider orchestration processes from workers.
Furthermore, to supplement this understanding and for added context, in December 2020 I wrote about Data Factory Activity Concurrency Limits – What Happens Next? and Pipelines – Understanding Internal vs External Activities. Both of which now add to a much clearer picture regarding the ability to scale pipelines for the purposes of large-scale extraction and transformation processes.

Read on for details about the scenario, as well as a design pattern to explain the process. This is a large solution for a large-scale problem.

Comments closed

Category: ETL / ELT