2023-08-29 – Curated SQL

Versioned State Store in Kafka Streams

Published 2023-08-29 by Kevin Feasel

Victoria Xia announces new functionality in Apache Kafka 3.5:

Since the introduction of stream processing, there have been three certainties in life: death, taxes, and out-of-order data. As a stream processing library built for Apache Kafka, Kafka Streams processes data in offset order. When out-of-order data is present, offset order differs from timestamp order and care must be taken to ensure that processing results respect timestamp order where appropriate. The introduction of versioned state stores to Kafka Streams in the Apache Kafka 3.5 release is a huge milestone in this direction.

In this blog post, I’ll address the what, why, and how of versioned stores in Kafka Streams, including what they are, why you might like to use them, how to get started, and a couple of things to watch out for when upgrading.

Read on to see what this entails and how you can try it out yourself.

Comments closed

Returning Matrix Elements in Spiral Order in R

Published 2023-08-29 by Kevin Feasel

Tomaz Kastrun forgot to remove The Club from his REPL:

Another one from the Leetcode challenge. This time, get the elements (single values) from the matrix in a spiral order with a starting position of [1,1].

So, the basic idea is to retrieve a vector of elements from a matrix in the following order:

Probably not something you’d use with any frequency, but it’s a fun way to learn how to operate within matrices.

Comments closed

Built-In R Datasets

Published 2023-08-29 by Kevin Feasel

Adrian Tam continues a series on getting started in R:

The ecosystem in R contains not only the function libraries to help you perform statistical analysis but also the data library that gives you some famous datasets to test out your program. There are a lot of built-in datasets in R. In this post, you will:

Learn some of the built-in datasets

Know how to use these datasets

Let’s get started.

Most of these built-in sets are fairly small and able to help you illustrate a specific point.

Comments closed

The Medallion Architecture in Data Modeling

Published 2023-08-29 by Kevin Feasel

Nikola Ilic gets the gold:

The most common pattern for modeling the data in the lakehouse is called a medallion. I love this name – it’s really easy to remember. But, why medallion? Tag along and you’ll soon find out why.

The same as for the lakehouse concept, credits for being pioneers in the medallion approach goes to Databricks.

What I’ve found interesting is the number of people who have taken to disliking the medallion architecture terms because Databricks pushed it so hard that their clients automatically assumed “medallion = using Databricks.”

Comments closed

Managing Security Roles for Hierarchical Organizations in Power BI

Published 2023-08-29 by Kevin Feasel

Marco Russo and Alberto Ferrari are working for The Man:

The security model in Tabular used by Power BI can filter rows of a table based on a DAX expression. When security is applied to a hierarchical structure, every hierarchy level is represented by a different column in the table. This structure can make it challenging to define a dynamic security filter based on the name of a node in the hierarchy, because the DAX expression must filter the column corresponding to the hierarchical level in which that node exists. If the security needs to be maintained dynamically in a configuration table, the resulting code may end up being extremely complex and hard to maintain, as well as create possible performance issues.

Without describing the complexity of solutions based on a filter applied directly to the appropriate hierarchy level, we want to describe a solution that minimizes the effort required in maintaining a configuration table for the dynamic security rules, while also providing good performance at execution time by minimizing the processing overhead required to apply the dynamic security.

Click through for the scenario and how you can implement this kind of security model in Power BI.

Comments closed

Group Replication in MySQL

Published 2023-08-29 by Kevin Feasel

Aisha Bukar continues a series on replication in MySQL:

MySQL Group replication is a remarkable feature introduced in MySQL 5.7 as a plugin. This technology allows you to create a reliable group of database servers. One of the most important features of MySQL’s group replication is that it allows these servers to store redundant data. This allows the database state to be replicated across multiple servers making it efficient in the situation where there is a server breakdown, the other servers in the cluster can agree to work together.

This technology is built on top of the MySQL InnoDB storage engine and employs a multi-source replication approach which we discussed in part 3 of the replication series. In this article, we’d be looking at an overview of the group replication technique, configuring and managing group replication, and also best practices for group replication. So, let’s get started!

Read on to see how it works and some recommendations around using it.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Day: August 29, 2023

Versioned State Store in Kafka Streams

Returning Matrix Elements in Spiral Order in R

Built-In R Datasets

The Medallion Architecture in Data Modeling

Managing Security Roles for Hierarchical Organizations in Power BI

Group Replication in MySQL