ETL / ELT – Page 15 – Curated SQL

Feeding Synapse Spark Info to On-Prem Kafka Clusters

Published 2022-08-04 by Kevin Feasel

Microsoft’s official documentation for Azure Data Factory contains a tutorial which explains how to access an On-Premises SQL Server from Azure Data Factory which is inside a Managed Vnet. You can go through that article here: Access on-premises SQL Server from Data Factory Managed Vnet using Private Endpoint – Azure Data Fac….
Although based upon the article’s solution, to meet our requirements we needed to substitute On-Prem Apache Kafka for On-Prem SQL Server and instead of an Azure Data Factory inside a Managed Vnet, we used a Synapse Workspace inside a Managed Vnet. The “Forwarding Vnet” concept explained in the above tutorial remains as-is in our approach.

As soon as you turn on Data Exfiltration Protection (DEP), the lockdown is real. Click through to see what the process of exfiltrating data through an approved mechanism looks like.

Comments closed

Argument {0} is Null or Empty in Azure Data Factory

Published 2022-07-18 by Kevin Feasel

Richard Swinbank diagnoses a problem:

I encountered this error recently while using Azure Data Factory’s (ADF) new(ish) Script activity to run a SQL query. It took me a while to find a fix, and when it happened again two weeks later I’d completely forgotten it. With a bit of luck, writing it down will help me remember next time – and if it helps you too, great

Read on to see what the problem was and how Richard solved it.

Comments closed

Working with Synapse Link for SQL

Published 2022-07-18 by Kevin Feasel

Steve Howard gives us an overview of a preview:

Azure Synapse Link for SQL greatly simplifies analytics pipelines as Microsoft manages the orchestration process for you. Since being announced at Microsoft Build, many of you have had the opportunity to try it out in a POC so now seems like a good time to take a deep dive on some implementation aspects that may save you time later.
This blog post will assume you have experience with Synapse dedicated SQL pools and that you have some basic working knowledge of Synapse Link for SQL from following the quick start or from doing an initial POC.

Looking at this, I am a bit concerned about what it means to sync actively changing tables, especially ones large enough to benefit from being in a dedicated SQL pool. “Just reload all the data” may be the right answer but it doesn’t sound like a convenient one.

Comments closed

Extracting Data from DAX Measures into CSV

Published 2022-06-17 by Kevin Feasel

Gilbert Quevauvilliers builds a process:

In this blog post I am going to demonstrate how to use the new Power Automate Flow to extract data from a DAX measure into a SharePoint CSV file.
I got this idea after reading the blog post from the Microsoft Power BI Team: Unlocking new self-service BI scenarios with ExecuteQueries support in Power Automate | Microsoft Power BI Blog | Microsoft Power BI
The great news is that this works on Power BI Pro, Premium Per User and Premium.

Read on to see how.

Comments closed

Testing Azure Synapse Link for SQL Server 2022

Published 2022-06-07 by Kevin Feasel

Kevin Chant gives Synapse Link for SQL Server a try:

Azure Synapse Link for SQL Server 2022 allows you to replicate your data from a SQL Server 2022 database to an Azure Synapse Analytics dedicated SQL Pool.
It is one of the options for the new Azure Synapse Link for SQL feature that was announced during Microsoft Build. You can read more about this in the Microsoft post which also announced the Public Preview of Azure Synapse Link for SQL.

Click through to see what Kevin has found so far. I think by the time this rolls out GA, it should be pretty good.

Comments closed

Processing Security Logs in Databricks with Delta Live Tables

Published 2022-06-06 by Kevin Feasel

Silvio Fiortio ingests some data:

Databricks recently introduced Workflows to enable data engineers, data scientists, and analysts to build reliable data, analytics, and ML workflows on any cloud without needing to manage complex infrastructure. Workflows allows users to build ETL pipelines that are automatically managed, including ingestion, and lineage, using Delta Live Tables. The benefits of Workflows and Delta Live Tables easily apply to security data sources, allowing us to scale to any volume or latency required for our operational needs.
In this article we’ll demonstrate some of the key benefits of Delta Live Tables for ingesting and processing security logs, with a few examples of common data sources we’ve seen our customers load into their cyber Lakehouse.

Click through to learn more.

Comments closed

Loading JSONB into Postgres via Azure Data Factory

Published 2022-06-01 by Kevin Feasel

Rayis Imayev is slinging JSON:

Requirements:
1. Sourcing data comes from a SQL Server database
2. The destination is a PostgreSQL database table
3. Transformation logic is to aggregate several rows from a sourcing table and populate the resulting JSON structured document into a single row JSONB type column

Read on for Rayis’s notes.

Comments closed

Cost Savings with Azure Data Factory

Published 2022-05-20 by Kevin Feasel

Koen Verbeeck maximizes the savings:

As you might’ve noticed, pricing in ADF is not the same as it was in SSIS for example. In SSIS, you pay your SQL Server license and you’re done (well, and you buy a server to run it on). It doesn’t matter what you do with SSIS, the cost is the same. If you run 1 package or 1000 packages, there’s no difference except in your electricity bill. However, in ADF you pay more if you use it more. You pay for each action you do, you pay for each activity you use and for how long things are running. There are a couple of guidelines you can follow to try to minimize costs:

Read on for those guidelines and some specific helpful items.

Comments closed

Databricks Workflows

Published 2022-05-16 by Kevin Feasel

Stacy Kerkela, et al, make an announcement:

Today we are excited to introduce Databricks Workflows, the fully-managed orchestration service that is deeply integrated with the Databricks Lakehouse Platform. Workflows enables data engineers, data scientists and analysts to build reliable data, analytics, and ML workflows on any cloud without needing to manage complex infrastructure. Finally, every user is empowered to deliver timely, accurate, and actionable insights for their business initiatives.

This looks a bit like Synapse pipelines. It’ll be interesting to see how this evovles.

Comments closed

Replacing Common Table Expressions in ADF Dataflows

Published 2022-05-16 by Kevin Feasel

Jeet Kainth needs an alternative:

At the time of writing, it is not possible to write a query using a CTE in the source of a dataflow. However, there are a few options to deal with this limitation:
– re-write the query using subqueries instead of CTEs
– use a stored procedure that contains the query and reference the stored proc in the source of the dataflow
– write the query as a view and reference the view in the source of the dataflow (this is my preferred method and the one I will demo here)

Jeet focuses on the third alternative. I’d lean toward the second or the third alternative, myself. Probably the second one (stored procedures) but both allow me to create an interface between ADF and the database. That way, underlying table changes will be less likely to require me to make code changes in ADF.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: ETL / ELT