Press "Enter" to skip to content

Author: Kevin Feasel

Versioned State Store in Kafka Streams

Victoria Xia announces new functionality in Apache Kafka 3.5:

Since the introduction of stream processing, there have been three certainties in life: death, taxes, and out-of-order data. As a stream processing library built for Apache Kafka, Kafka Streams processes data in offset order. When out-of-order data is present, offset order differs from timestamp order and care must be taken to ensure that processing results respect timestamp order where appropriate. The introduction of versioned state stores to Kafka Streams in the Apache Kafka 3.5 release is a huge milestone in this direction.

In this blog post, I’ll address the what, why, and how of versioned stores in Kafka Streams, including what they are, why you might like to use them, how to get started, and a couple of things to watch out for when upgrading.

Read on to see what this entails and how you can try it out yourself.

Comments closed

The Medallion Architecture in Data Modeling

Nikola Ilic gets the gold:

The most common pattern for modeling the data in the lakehouse is called a medallion. I love this name – it’s really easy to remember. But, why medallion? Tag along and you’ll soon find out why.

The same as for the lakehouse concept, credits for being pioneers in the medallion approach goes to Databricks.

What I’ve found interesting is the number of people who have taken to disliking the medallion architecture terms because Databricks pushed it so hard that their clients automatically assumed “medallion = using Databricks.”

Comments closed

Built-In R Datasets

Adrian Tam continues a series on getting started in R:

The ecosystem in R contains not only the function libraries to help you perform statistical analysis but also the data library that gives you some famous datasets to test out your program. There are a lot of built-in datasets in R. In this post, you will:

  • Learn some of the built-in datasets
  • Know how to use these datasets

Let’s get started.

Most of these built-in sets are fairly small and able to help you illustrate a specific point.

Comments closed

Group Replication in MySQL

Aisha Bukar continues a series on replication in MySQL:

MySQL Group replication is a remarkable feature introduced in MySQL 5.7 as a plugin. This technology allows you to create a reliable group of database servers. One of the most important features of MySQL’s group replication is that it allows these servers to store redundant data. This allows the database state to be replicated across multiple servers making it efficient in the situation where there is a server breakdown, the other servers in the cluster can agree to work together.

This technology is built on top of the MySQL InnoDB storage engine and employs a multi-source replication approach which we discussed in part 3 of the replication series. In this article, we’d be looking at an overview of the group replication technique, configuring and managing group replication, and also best practices for group replication. So, let’s get started!

Read on to see how it works and some recommendations around using it.

Comments closed

Managing Security Roles for Hierarchical Organizations in Power BI

Marco Russo and Alberto Ferrari are working for The Man:

The security model in Tabular used by Power BI can filter rows of a table based on a DAX expression. When security is applied to a hierarchical structure, every hierarchy level is represented by a different column in the table. This structure can make it challenging to define a dynamic security filter based on the name of a node in the hierarchy, because the DAX expression must filter the column corresponding to the hierarchical level in which that node exists. If the security needs to be maintained dynamically in a configuration table, the resulting code may end up being extremely complex and hard to maintain, as well as create possible performance issues.

Without describing the complexity of solutions based on a filter applied directly to the appropriate hierarchy level, we want to describe a solution that minimizes the effort required in maintaining a configuration table for the dynamic security rules, while also providing good performance at execution time by minimizing the processing overhead required to apply the dynamic security.

Click through for the scenario and how you can implement this kind of security model in Power BI.

Comments closed

Plotting Multiple Histograms in R

Steven Sanderson shows us two libraries to plot two histograms:

Histograms are a powerful tool for visualizing the distribution of numerical data. They allow us to quickly understand the frequency distribution of values within a dataset. In this tutorial, we’ll explore how to create multiple histograms using two popular R packages: base R and ggplot2. By the end of this guide, you’ll be able to confidently display multiple histograms on a single graph using both methods.

Click through for more than two examples.

Comments closed

File Not Found in SQL Server 2022 with Distributed AG and Filestream

Sean Gallardy goes sleuthing:

I don’t often find many people using FileStream in their databases (which isn’t a bad or good thing, in my opinion, just a statement of fact). Some technologies in SQL Server use it behind the scene, such as FileTable or Hekaton, and there isn’t really any getting around it in those cases. However, I was brought an interesting issue by a friend on Database Administrators Stack Exchange, Hannah Vernon (w), when it came to a database that was in a Distributed Availability Group in 2019 and had no issues, but after upgrading to SQL Server 2022, started having a major problem.

Read on for Sean’s analysis of the problem and solution.

Comments closed

Powershell Quizzes

Jeff Hicks wants you to think fast:

Time to get back to the to blog. I’ve been working through my backlog of projects. These are things that I started writing or updating but then got pushed to the back of the line. One of these projects is a PowerShell module I wrote as a teaching tool. The idea was to create short quizzes on PowerShell topics that someone could take in a PowerShell session. My idea was to create quizzes on PowerShell topics, but you can create a quiz on anything. If you want to try things out, install the PSQuizMaster module from the PowerShell Gallery. The module will work in Windows PowerShell and PowerShell 7, including cross-platform.

Read on to see what’s in a quiz and how to create your own quiz.

Comments closed

ADX Date and Time Representations in Power Query and Power BI

Dany Hoter does some explaining:

Data in ADX (aka Kusto aka RTA in Fabric) almost always has columns that contain datetime values like 2023-08-01 16:45 and sometimes timespan values like 2 hours or 36 minutes.

In this article I’ll describe how these values are represented in ADX in Power Query and in Power BI.

Notice that I don’t just say Power BI because timespan values have different types in Power Query and in Power BI.

Read on for those details.

Comments closed