Press "Enter" to skip to content

Day: January 21, 2025

Streaming Data to Azure Event Hub via Mockaroo and Kafka API

Jasleen Kaur Wahi generates some data:

In a recent project, I faced the need to generate randomized data for transmission to the Azure Event Hub. This hub is a key component of Microsoft Azure, used for real-time data ingestion and processing.

First, let’s take look at how I created this random data. I wanted to come up with a way to make data that looks like what we see in the real world, but without using any real information from users. This made-up data was really important for a bunch of things, like checking if our software works well.

Read on to see how Mockaroo works and the end result. Creating tests for streaming services like Event Hubs is a challenge, so this is an interesting approach to the task.

Leave a Comment

Transitioning from Elasticsearch to OpenSearch

Nileh Jain has a guide for us:

Elasticsearch and OpenSearch are powerful tools for handling search and analytics workloads, offering scalability, real-time capabilities, and a rich ecosystem of plugins and integrations. Elasticsearch is widely used for full-text search, log monitoring, and data visualization across industries due to its mature ecosystem. OpenSearch, a community-driven fork of Elasticsearch, provides a fully open-source alternative with many of the same capabilities, making it an excellent choice for organizations prioritizing open-source principles and cost efficiency. 

Migration to OpenSearch should be considered if you are using Elasticsearch versions up to 7.10 and want to avoid licensing restrictions introduced with Elasticsearch’s SSPL license. It is also ideal for those seeking continued access to an open-source ecosystem while maintaining compatibility with existing Elasticsearch APIs and tools. Organizations with a focus on community-driven innovation, transparent governance, or cost control will find OpenSearch a compelling option.

Click through for the prep work and the guide.

Leave a Comment

A Quick Primer on KQL

Reitse Eskens takes us through a language:

This post can come as a shock if you’re used to writing T-SQL. Because not only is there more than one useful language to process data, realtime data in this case, but it also has enough similarities to SQL to look familiar and is different enough to leave you flustered.

Now, to get a complete introduction into KQL or the Kusto Query Language, one blogpost (or video) would never be enough. There are so many operators that can fill an entire series on their own.

In this blog, the focus will be on the basic structure of KQL and a number of common operators. They will be compared with the counterparts in SQL for reference.

I fully agree with Reitse. I’ve put together a full-length talk on KQL and still feel like I’m covering the basics. It’s not that KQL is some monstrously complicated language, but it is different enough from other languages like T-SQL that you cannot easily apply knowledge from one to the other.

Leave a Comment

Multi-Column Statistics in PostgreSQL

Hans-Jürgen Schönig creates new statistics:

If you are using PostgreSQL for analytics or large-scale aggregations, you might occasionally notice the planner making false assumptions regarding the number of rows. While this isn’t a problem for small aggregates, it is indeed an issue for large-scale aggregations featuring many different dimensions.

In short: The more columns your GROUP BY statement contains, the more likely it is that optimizer overestimates the row count.

This blog explains how this can be handled in PostgreSQL.

Maybe it’s just me, but I don’t recall many instances in which adding multi-column statistics without any sort of index change significantly improved a query’s performance. I can understand how it could improve things like memory grants, so perhaps that’s how I’m selling it short. But I struggle to recall a specific case in which a query got measurably faster as a result.

Leave a Comment

Trusted Servers for Power BI TLS Connections

Andy Brownsword works around an issue:

I recently had an issue when sourcing data in Power BI from a server which was accessed by a DNS alias. Here I’ll demonstrate the issue and how to resolve it.

After entering the server details, we could be greeted with the message below:

The server name provided does not match the server name on the SQL Server SSL Certificate. Please contact your administrator or try changing your Connection encryption settings

Click through for a solution if you cannot re-issue the certificate with the relevant DNS alias.

Leave a Comment