October 2021 – Curated SQL

A Primer on Kafka Streams

Published 2021-10-29 by Kevin Feasel

Bill Bejeck has an introduction to Kafka Streams:

Kafka Streams is an abstraction over Apache Kafka^® producers and consumers that lets you forget about low-level details and focus on processing your Kafka data. You could of course write your own code to process your data using the vanilla Kafka clients, but the Kafka Streams equivalent will have far fewer lines, because it’s declarative rather than imperative. As a library, Kafka Streams lets you create a standalone application that can be run anywhere that can connect to a Kafka broker, whether that’s a laptop or a hefty cloud server. You just need to provide it with the host and port name of a broker. Combining Kafka Streams with Confluent Cloud grants you even more processing power with very little code investment.

Click through for a description as well as a whole series of embedded videos.

Comments closed

Getting Started with Sparks in Azure Synapse Analytics

Published 2021-10-29 by Kevin Feasel

Hiram Fleitas has a guide for us:

Step 1 watch this video
Step 2 skim through these slides for more context:
The rest is all hands-on stuff – if you get stuck at any point lmk.

Click through for an overview video from Euan Garden and several resources and tutorials.

Comments closed

From Kafka to Azure Data Explorer

Published 2021-10-29 by Kevin Feasel

Niels Berglund uses Kafka Connect to link an Apache Kafka topic to Azure Data Explroer:

If you follow my blog, you probably know that I am a huge fan of Apache Kafka and event streaming/stream processing. Recently Azure Data Explorer (ADX) has caught my eye. In fact, in the last few weeks, I did two conference sessions about ADX. A month ago, I published a blog post related to Kafka and ADX: Run Self-Managed Kusto Kafka Connector Serverless in Azure Container Instances.
As the title of that post implies, it looked at the ADX Kafka sink connector and how to run it in Azure. What the post did not look at was how to configure the connector and connect it to ADX. That is what we will do in this post (and maybe in a couple of more posts).

This post serves as a complete tutorial, though Niels does promise future posts on other ingestion methods, so stay tuned.

Comments closed

Automating Single Table Refresh with Azure Data Factory and Azure Automation

Published 2021-10-29 by Kevin Feasel

Marc Lelijveld wants to refresh a single table:

Back in February, I wrote a blog on how you can trigger a single table to refresh in your Power BI data model. This blog described how you can achieve this goal using a PowerShell script and the ASCmd cmdlets for Analysis Services, which also works for Power BI Premium. In the wrap-up of that blog, I promised to follow-up with a blog on how to achieve the same goal with Azure Data Factory. It took a little bit longer than expected to finalize this post, but here it is!
In this blog, co-authored by my colleague Paulien van Eijk, we will describe how you can automate your single table refresh in the Power BI Service, including all dependencies with downstream dataflows using Azure Data Factory and Azure Automation. All this is based on real life scenarios and a solution build in collaboration between Dave Ruijter, Paulien and me.

Read on for Marc and Paulien’s solution.

Comments closed

Abnormal Tables and Skewed Data

Published 2021-10-29 by Kevin Feasel

Erik Darling reminds us to be vigilant in database design:

But the Posts table suffers from a serious design flaw in the public data dump: Questions and Answers are in the same table.
I’ve heard that it’s worse behind the scenes, but I don’t have any additional details on that.

Read on to understand why this is a problem and what the ramifications are.

Comments closed

An Overview of Azure Logic Apps

Published 2021-10-29 by Kevin Feasel

Elayne Jones takes us through the use case for Azure Logic Apps:

Relying on automated workflows, instead of human intervention, ensures data consistency and availability. Automated workflows are, therefore, an integral piece of a sophisticated Modern Data Platform. Now, thanks to Azure Logic Apps, creating a complex workflow is no longer a daunting technical challenge!

Read on to see how they work, what kinds of connectors are available, and the sorts of things you can build with it.

Comments closed

Multi-Threading with dbatools

Published 2021-10-28 by Kevin Feasel

Andy Levy has some lessons learned:

Over the summer, I spent some (a lot of) time working on updates to a script at work which runs multiple processes in parallel. Everything seemed to work OK for a while, but then everything broke. It broke right around the time dbatools 1.1 dropped, so I started thinking that something must have changed there. Turns out, it was entirely my fault and I hope this post will help you avoid the same trap.

Don’t fall into the same traps Andy did; read the whole thing.

Comments closed

System-Versioned Ledger Tables

Published 2021-10-28 by Kevin Feasel

Randolph West has a series on ledger tables in SQL Server. First up is a primer on the topic:

System-versioned ledger tables leverage the same technology: there is a table with current data in it, and an underlying history table which keeps track of changes. However, it uses a cryptographic chain that provides digital forensic evidence of tampering. Yes, if you’ll pardon the use of this phrase, I’m talking about a blockchain.
This is not a cryptocurrency. No one is using expensive graphics cards to produce a fiat currency in someone’s basement. Instead, each transaction affecting the database in question is cryptographically hashed using a SHA-256 algorithm and then stored somewhere off-site.

Part two separates out the two types of ledger table:

This week we will look at the different types of ledger table: append-only and updatable.
Unlike temporal tables, a ledger table can be append-only which makes it immutable. You can only insert data and therefore it does not need a history table. In fact, you may be using append-only tables in your data warehouse already. While this is secure, it may not be practical.

Part three covers limitations:

Every choice we make is a trade-off. New features have limitations, and ledger tables are no exception.
Some of these limitations are perfectly sensible. For example, the whole point of ledger tables is to ensure that we can provide tamper evidence. This necessarily means you can’t turn it off once it’s enabled, unless you drop the database entirely — this is just one scenario where a full defence-in-depth strategy is required.

Part four is the one I’ve been waiting for—an explanation why you probably don’t need this:

After writing several posts about a neat feature in Azure SQL called system-versioned ledger tables, it reminded me about something I’ve wanted to say for a number of years now, outside of snarky tweets.
Here goes:
You don’t need a blockchain.
In the vast majority of use cases, you need a properly audited relational database system with ACID compliance and a good recovery strategy.

There are very specific use cases in which data hashes and ledger tables make sense.

Comments closed

Adding an Animated GIF to Power BI Reports

Published 2021-10-28 by Kevin Feasel

Ed Hansberry works around a Power BI limitation:

It is easy to add an animated GIF to your Power BI Reports. However, if you just add it as an image, it won’t animate. You’ll just get a static image.
Animated GIFs can be useful in reports to explain new features to users, or on hidden developer pages showing how certain things were done.

Click through for instructions on how to include an animated GIF on your Power BI report. Just make sure to pronounce it the right way.

Comments closed

More Efficient Pivoting

Published 2021-10-28 by Kevin Feasel

Dave Mason is on the hunt:

While working with some poorly performing code in T-SQL that used a PIVOT operator, I wondered if there was a more efficient way to get a result set of pivoted data. It may have been a fool’s errand, but I still wanted to try. It dawned on me that I could use the STRING_AGG() function to build a delimited list of pivoted column names and values. From there, I’d have to “shred” the delimited data to rows and columns. But how?

Read on to see how.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Month: October 2021