ETL / ELT – Page 8 – Curated SQL

Working with Synapse Link for SQL

Published 2022-07-18 by Kevin Feasel

Steve Howard gives us an overview of a preview:

Azure Synapse Link for SQL greatly simplifies analytics pipelines as Microsoft manages the orchestration process for you. Since being announced at Microsoft Build, many of you have had the opportunity to try it out in a POC so now seems like a good time to take a deep dive on some implementation aspects that may save you time later.
This blog post will assume you have experience with Synapse dedicated SQL pools and that you have some basic working knowledge of Synapse Link for SQL from following the quick start or from doing an initial POC.

Looking at this, I am a bit concerned about what it means to sync actively changing tables, especially ones large enough to benefit from being in a dedicated SQL pool. “Just reload all the data” may be the right answer but it doesn’t sound like a convenient one.

Comments closed

Extracting Data from DAX Measures into CSV

Published 2022-06-17 by Kevin Feasel

Gilbert Quevauvilliers builds a process:

In this blog post I am going to demonstrate how to use the new Power Automate Flow to extract data from a DAX measure into a SharePoint CSV file.
I got this idea after reading the blog post from the Microsoft Power BI Team: Unlocking new self-service BI scenarios with ExecuteQueries support in Power Automate | Microsoft Power BI Blog | Microsoft Power BI
The great news is that this works on Power BI Pro, Premium Per User and Premium.

Read on to see how.

Comments closed

Testing Azure Synapse Link for SQL Server 2022

Published 2022-06-07 by Kevin Feasel

Kevin Chant gives Synapse Link for SQL Server a try:

Azure Synapse Link for SQL Server 2022 allows you to replicate your data from a SQL Server 2022 database to an Azure Synapse Analytics dedicated SQL Pool.
It is one of the options for the new Azure Synapse Link for SQL feature that was announced during Microsoft Build. You can read more about this in the Microsoft post which also announced the Public Preview of Azure Synapse Link for SQL.

Click through to see what Kevin has found so far. I think by the time this rolls out GA, it should be pretty good.

Comments closed

Processing Security Logs in Databricks with Delta Live Tables

Published 2022-06-06 by Kevin Feasel

Silvio Fiortio ingests some data:

Databricks recently introduced Workflows to enable data engineers, data scientists, and analysts to build reliable data, analytics, and ML workflows on any cloud without needing to manage complex infrastructure. Workflows allows users to build ETL pipelines that are automatically managed, including ingestion, and lineage, using Delta Live Tables. The benefits of Workflows and Delta Live Tables easily apply to security data sources, allowing us to scale to any volume or latency required for our operational needs.
In this article we’ll demonstrate some of the key benefits of Delta Live Tables for ingesting and processing security logs, with a few examples of common data sources we’ve seen our customers load into their cyber Lakehouse.

Click through to learn more.

Comments closed

Loading JSONB into Postgres via Azure Data Factory

Published 2022-06-01 by Kevin Feasel

Rayis Imayev is slinging JSON:

Requirements:
1. Sourcing data comes from a SQL Server database
2. The destination is a PostgreSQL database table
3. Transformation logic is to aggregate several rows from a sourcing table and populate the resulting JSON structured document into a single row JSONB type column

Read on for Rayis’s notes.

Comments closed

Cost Savings with Azure Data Factory

Published 2022-05-20 by Kevin Feasel

Koen Verbeeck maximizes the savings:

As you might’ve noticed, pricing in ADF is not the same as it was in SSIS for example. In SSIS, you pay your SQL Server license and you’re done (well, and you buy a server to run it on). It doesn’t matter what you do with SSIS, the cost is the same. If you run 1 package or 1000 packages, there’s no difference except in your electricity bill. However, in ADF you pay more if you use it more. You pay for each action you do, you pay for each activity you use and for how long things are running. There are a couple of guidelines you can follow to try to minimize costs:

Read on for those guidelines and some specific helpful items.

Comments closed

Databricks Workflows

Published 2022-05-16 by Kevin Feasel

Stacy Kerkela, et al, make an announcement:

Today we are excited to introduce Databricks Workflows, the fully-managed orchestration service that is deeply integrated with the Databricks Lakehouse Platform. Workflows enables data engineers, data scientists and analysts to build reliable data, analytics, and ML workflows on any cloud without needing to manage complex infrastructure. Finally, every user is empowered to deliver timely, accurate, and actionable insights for their business initiatives.

This looks a bit like Synapse pipelines. It’ll be interesting to see how this evovles.

Comments closed

Replacing Common Table Expressions in ADF Dataflows

Published 2022-05-16 by Kevin Feasel

Jeet Kainth needs an alternative:

At the time of writing, it is not possible to write a query using a CTE in the source of a dataflow. However, there are a few options to deal with this limitation:
– re-write the query using subqueries instead of CTEs
– use a stored procedure that contains the query and reference the stored proc in the source of the dataflow
– write the query as a view and reference the view in the source of the dataflow (this is my preferred method and the one I will demo here)

Jeet focuses on the third alternative. I’d lean toward the second or the third alternative, myself. Probably the second one (stored procedures) but both allow me to create an interface between ADF and the database. That way, underlying table changes will be less likely to require me to make code changes in ADF.

Comments closed

A Workaround to Parameterize Synapse Execute Pipeline Activities

Published 2022-04-14 by Kevin Feasel

Nick Edwards hits us with bad news:

Unfortunately, as of April 2022 there is not an option to parameterise or add dynamic content to an “Execute Pipeline” activity to invoke a pipeline run.
However, with the use of a Microsoft API there is method which we can use to overcome this.

Click through to learn more about this API call and see an example of it in action.

Comments closed

Costs for Managed Virtual Networks in Azure Data Factory

Published 2022-04-13 by Kevin Feasel

Martin Schoombee brings up an interesting point:

We were running SSIS in an Azure VM, spinning the VM up and down as required to run the ETL processes. A third-party SSIS component was used to extract data out of Dynamics 365 CRM, and accounted for a significant part of the yearly costs. I blogged about the reasons why I think it’s worth moving from Azure AS to Power BI PPU before, and combined with the move to Azure Data Factory I estimated a cost reduction of almost 35%.
After deploying the solution I noticed that our daily ETL costs were significantly higher than I thought it would be, and that started a little rabbit-hole exercise to figure out why.

I’m used to thinking about managed virtual networks in the case of Azure Synapse Analytics, where I think it makes a lot of sense as a default (especially because you can’t switch after you’ve made a decision).

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Category: ETL / ELT