Press "Enter" to skip to content

Author: Kevin Feasel

Dynamic Routing with Kafka Streams

Yeva Byzek explains how you can use Kafka Streams to perform dynamic routing of messages:

A cleaner way is to provide the service with a separate stream that contains only the relevant subset of events that the microservice cares about. To achieve this, a streaming application can branch the original event stream into different substreams using the method KStream#branch(). This results in new Kafka topics, so then the microservice can subscribe to one of the branched streams directly.

For example, in the finance domain, consider a fraud remediation microservice that should process only the subset of events suspected of being fraudulent. As shown below, the original stream of events is branched into two new streams: one for suspicious events and one for validated events. This enables the fraud remediation microservice to process just the stream of suspicious events, without ever seeing the validated events.

Read on to learn more.

Comments closed

The (Un)Importance of Index Fragmentation

Tibor Karaszi argues that index fragmentation is a less serious issue than most DBAs think:

You know the story. Every week or so, we defragment the indexes. Many of us uses Ola Hallengren’s great script for this, some uses Maintenance Plans, and there are of course other alternatives as well. But are we just wasting time and effort? Quite probably we are. I’m going to start with some basics, and then do some reasoning, and finally give you some numbers of a very simple test that I ran. The T-SQL code is available. If you give it a try, please let us know your finding for your environment by adding a comment. I will do some generalizations and simplifications, to avid this post being 10 times longer.

Jeff Moden has a couple of great talks on the topic which really pushed me in this direction. Grab his slides from the SQL Saturday site for a much deeper look at this topic.

Comments closed

Learning with Limited Data

Shioulin Sam and Nisha Muktewar have new research on machine learning when getting labeled data is time-consuming or difficult:

We are excited to release Learning with Limited Labeled Data, the latest report and prototype from Cloudera Fast Forward Labs.

Being able to learn with limited labeled data relaxes the stringent labeled data requirement for supervised machine learning. Our report focuses on active learning, a technique that relies on collaboration between machines and humans to label smartly.

Active learning makes it possible to build applications using a small set of labeled data, and enables enterprises to leverage their large pools of unlabeled data. In this blog post, we explore how active learning works. (For a higher level introduction, please see our previous blogpost.

The research itself is behind a paywall but you can see their write-up to get an idea of the topic.

Comments closed

Understanding the Page Resource Cracker

John Morehouse looks at the new sys.fn_PageResCracker() function in SQL Server 2019:

In a previous blog post, I discussed two new methods in SQL Server 2019 to determine exactly which page a request might be waiting for when there is contention.  One of these new methods involves a new function, fn_pagerescracker.   Naturally, I wanted to see how this function operates.  Let’s look at the Master database to investigate how it works!

Click through for the function definition and what it all means.

Comments closed

Desired State Configuration Resources

Jess Pomfret takes us through resources in Desired State Configuration:

A critical part of our DSC configuration is made up of resources. These are the building blocks we need to to define our desired state.  There are two kinds of resources that we can use: class based and MOF based (most common). We are going to focus our efforts today on looking at MOF based resources.

Resources come packaged up as modules and our servers, which use at least WMF 4.0, come with several built-in. We have two main options for additional resources; we can find DSC resource modules in the PowerShell Gallery or we can write our own.

Jess wraps up the post with five useful resources for database administrators.

Comments closed

Limitations on Inlining Functions

Erik Darling takes us through a limitation to SQL Server 2019 scalar function inlining:

There’s a lot of excitement (alright, maybe I’m sort of in a bubble with these things) about SQL Server 2019 being able to inline most scalar UDFs.

But there’s a sort of weird catch with them. It’s documented, but still.

If you use GETDATE in the function, it can’t be inlined.

GETDATE() and its bretheren are non-deterministic so I figured that would be an issue. Check out the documentation for the other limitations.

Comments closed

Getting Started with Azure Databricks

Brad Llewellyn has a tutorial for Azure Databricks:

Databricks is a managed Spark framework, similar to what we saw with HDInsight in the previous post.  The major difference between the two technologies is that HDInsight is more of a managed provisioning service for Hadoop, while Databricks is more like a managed Spark platform.  In other words, HDInsight is a good choice if we need the ability to manage the cluster ourselves, but don’t want to deal with provisioning, while Databricks is a good choice when we simply want to have a Spark environment for running our code with little need for maintenance or management.

Azure Databricks is not a Microsoft product.  It is owned and managed by the company Databricks and available in Azure and AWS.  However, Databricks is a “first party offering” in Azure.  This means that Microsoft offers the same level of support, functionality and integration as it would with any of its own products.  You can read more about Azure Databricks herehereand here.

Click through for a demonstration of the product.

Comments closed

Solving Logistic Regression Problems with Python

Hardik Jaroli shows how we can solve logistic regression problems using Python, using the Titanic data set as an example:

We will be working with the Titanic Data Set from Kaggle. We’ll be trying to predict a classification- survival or deceased.

Let’s begin by implementing Logistic Regression in Python for classification. We’ll use a “semi-cleaned” version of the titanic data set, if you use the data set hosted directly on Kaggle, you may need to do some additional cleaning.

Click through for the demo.

Comments closed