Press "Enter" to skip to content

Author: Kevin Feasel

Streaming Data to Azure Event Hub via Mockaroo and Kafka API

Jasleen Kaur Wahi generates some data:

In a recent project, I faced the need to generate randomized data for transmission to the Azure Event Hub. This hub is a key component of Microsoft Azure, used for real-time data ingestion and processing.

First, let’s take look at how I created this random data. I wanted to come up with a way to make data that looks like what we see in the real world, but without using any real information from users. This made-up data was really important for a bunch of things, like checking if our software works well.

Read on to see how Mockaroo works and the end result. Creating tests for streaming services like Event Hubs is a challenge, so this is an interesting approach to the task.

Leave a Comment

Transitioning from Elasticsearch to OpenSearch

Nileh Jain has a guide for us:

Elasticsearch and OpenSearch are powerful tools for handling search and analytics workloads, offering scalability, real-time capabilities, and a rich ecosystem of plugins and integrations. Elasticsearch is widely used for full-text search, log monitoring, and data visualization across industries due to its mature ecosystem. OpenSearch, a community-driven fork of Elasticsearch, provides a fully open-source alternative with many of the same capabilities, making it an excellent choice for organizations prioritizing open-source principles and cost efficiency. 

Migration to OpenSearch should be considered if you are using Elasticsearch versions up to 7.10 and want to avoid licensing restrictions introduced with Elasticsearch’s SSPL license. It is also ideal for those seeking continued access to an open-source ecosystem while maintaining compatibility with existing Elasticsearch APIs and tools. Organizations with a focus on community-driven innovation, transparent governance, or cost control will find OpenSearch a compelling option.

Click through for the prep work and the guide.

Leave a Comment

A Quick Primer on KQL

Reitse Eskens takes us through a language:

This post can come as a shock if you’re used to writing T-SQL. Because not only is there more than one useful language to process data, realtime data in this case, but it also has enough similarities to SQL to look familiar and is different enough to leave you flustered.

Now, to get a complete introduction into KQL or the Kusto Query Language, one blogpost (or video) would never be enough. There are so many operators that can fill an entire series on their own.

In this blog, the focus will be on the basic structure of KQL and a number of common operators. They will be compared with the counterparts in SQL for reference.

I fully agree with Reitse. I’ve put together a full-length talk on KQL and still feel like I’m covering the basics. It’s not that KQL is some monstrously complicated language, but it is different enough from other languages like T-SQL that you cannot easily apply knowledge from one to the other.

Leave a Comment

Multi-Column Statistics in PostgreSQL

Hans-Jürgen Schönig creates new statistics:

If you are using PostgreSQL for analytics or large-scale aggregations, you might occasionally notice the planner making false assumptions regarding the number of rows. While this isn’t a problem for small aggregates, it is indeed an issue for large-scale aggregations featuring many different dimensions.

In short: The more columns your GROUP BY statement contains, the more likely it is that optimizer overestimates the row count.

This blog explains how this can be handled in PostgreSQL.

Maybe it’s just me, but I don’t recall many instances in which adding multi-column statistics without any sort of index change significantly improved a query’s performance. I can understand how it could improve things like memory grants, so perhaps that’s how I’m selling it short. But I struggle to recall a specific case in which a query got measurably faster as a result.

Leave a Comment

Trusted Servers for Power BI TLS Connections

Andy Brownsword works around an issue:

I recently had an issue when sourcing data in Power BI from a server which was accessed by a DNS alias. Here I’ll demonstrate the issue and how to resolve it.

After entering the server details, we could be greeted with the message below:

The server name provided does not match the server name on the SQL Server SSL Certificate. Please contact your administrator or try changing your Connection encryption settings

Click through for a solution if you cannot re-issue the certificate with the relevant DNS alias.

Leave a Comment

Building Flowcharts in R

Pau Satorra makes a chart:

Fortunately, there are several packages in R for drawing flowcharts using different approaches. The problem is that the programming is generally quite complex, and the numbers have to be entered manually or parameterized beforehand. These flowcharts can have reproducible problems because if data changes, we have to manually change the parameters again.

To make our lives easier, there’s a new {flowchart} package that uses the tidyverse workflow, which allows to create many different types of flowcharts in just a few steps.

Read on to learn more about the package. I originally thought it was based on mermaid.js based on the way the final product looked, but a quick code review has disabused me of the notion. H/T R-Bloggers.

Leave a Comment

Musings on the State of Apache Kafka and Apache Flink

Adron Hall shares some thoughts:

I’ve worked with (** references at end of article) a number of Apache projects over the years, often pretty closely; Apache Cassandra, Apache Flink, Apache Kafka, Apache Zookeeper and numerous others. But the last few years I’ve not been immediately hands on with the technology. A few questions popped up recently, that fortunately I was able to answer based on existing knowledge, but it made me real curious about what the SITREP (Situational Report) is for the Apache Kafka and Flink Projects for TODAY, i.e. rolling into 2025! The following is a quick dive into the history and then the latest details (and drama?) with Apache Kafka, Flink, and tangentially some other projects (Zookeeper?).

Click through to see how the pieces fit together.

Leave a Comment

An Overview on Spinlocks in SQL Server

Stephen Planck talks spinlocks:

High concurrency can expose subtle performance bottlenecks in SQL Server, particularly those stemming from spinlocks and latch contention. Both mechanisms exist to synchronize access to shared data structures, yet they operate differently and require distinct troubleshooting approaches. By recognizing how they work and knowing what causes them to overload a system, DBAs can reduce CPU spikes, timeouts, and overall application slowdowns.

Read on to more about spinlocks and latch contention. My experiential bias is that spinlocks are the actual problem approximately 5% of the number of times that DBAs believe spinlocks are the actual problem.

Leave a Comment

An Example of TMDL View in Action

Chris Webb puts the pieces together for us:

For me the biggest new feature in the January 2025 release of Power BI Desktop is the new TMDL View; many other people like Marco are excited about it too. For more advanced Power BI developers (and honestly, I don’t think you need to be that advanced to get value out of it) it makes certain editing tasks for semantic models much simpler, and while I won’t be abandoning the main Power BI Desktop UI completely or stopping using external tools like Tabular Editor it is something I see myself using on a regular basis from now on.

Click through to see one thing you can do with it.

Leave a Comment

CPU and Memory Configuration for MySQL

Chisom Kanu continues a series on MySQL:

In the first part of the series article, we introduced MySQL Shell as a tool for managing and optimizing MySQL configurations. We discussed how to install the Shell, connect to the MySQL server, and modify basic configuration parameters. Now, we’re into performance optimization, focusing specifically on memory and CPU configurations. These two components are important because they directly impact how efficiently your database processes queries, handles connections, and stores data in memory. In this article, we will look at techniques using MySQL Shell to help you optimize both memory and CPU usage to ensure smooth and fast database performance.

Read on to learn more.

Leave a Comment