Kafka Connect Converters And Serialization

Kevin Feasel


ETL, Hadoop

Robin Moffatt goes into great detail on Apache Kafka Connect converters and serialization techniques:

Kafka Connect is modular in nature, providing a very powerful way of handling integration requirements. Some key components include:

  • Connectors – the JAR files that define how to integrate with the data store itself
  • Converters – handling serialization and deserialization of data
  • Transforms – optional in-flight manipulation of messages

One of the more frequent sources of mistakes and misunderstanding around Kafka Connect involves the serialization of data, which Kafka Connect handles using converters. Let’s take a good look at how these work, and illustrate some of the common issues encountered.

Read on for a good overview of the topic.

Separating Totals In DAX With IsInScope

Kasper de Jonge shows how we can use the IsInScope function in DAX to separate out parts of hierarchies like totals:

Finally I drag in the Values column and the Hierarchy in a matrix (I also turned on the +/- icons so we can expand collapse which is another November feature). The first thing we see is that the matrix looks weird with blanks and all.

The reason we are seeing this is this is how the data is set up, we have stored totals and grand totals at the lowest level in the hierarchy. Instead we want to show to them at the level where they are correct. Also we don’t want to show the aggregation created by the SUM.

Click through for the full example.

Automatically Enabling SQLCMD Mode In SSMS

Greg Low shows how to have every Management Studio tab open in SQLCMD mode:

Note the :CONNECT command is used to connect to another server.

Because everything else works pretty much the same, and you get a whole lot of additional options, you might choose to open all your new queries in SQLCMD mode. That’s easy to do.

SQLCMD mode is one of those things where I thought I’d use it a lot, but aside from deploying database projects, I don’t.  Granted, this could be a failure of imagination on my part.

Converting Binary To Hex With T-SQL

Kevin Feasel



Dave Mason uses STRING_SPLIT to convert binary values to their hex equivalents:

I started pondering it for a bit and began to wonder if I could use the new for SQL Server 2016 STRING_SPLIT function to convert a binary string to decimal. The thought process was to split the string into rows of CHAR(1) values, along with an in-string character position. Then I could take the “1” values, calculate the 2ᵡ value, and sum them up for a decimal value.

I quickly realized I’d need to reverse the binary string. I also discovered the STRING_SPLIT function eliminated 0’s adjacent to 1’s. I don’t have much use for this code at the moment, but maybe there’s something here that you can take away.

Given some of the issues Dave ran into, it seems that a tally table-based solution might be a better choice.

Tempdb Blocking With Non-Clustered Columnstore Indexes

Ned Otter runs into a tricky issue:

I have a client that used Itzik Ben-Gan’s solution of creating a filtered nonclustered columnstore index to achieve batch mode on a rowstore (in fact I proposed that the client consider it). They have an OLTP system, and often perform YTD calculations. When they tested, processing time was reduced by 30 to 50 percent, without touching a single line of application code. If that ain’t low hanging fruit, I don’t know what is —

However, during testing, I noticed some intermittent blocking that didn’t make sense to me. But I couldn’t nail it down, and they went live with the “filtered nonclustered columnstore index” solution.

Once they deployed – and there was a lot of concurrency – I could see what had eluded me during my proof of concept: blocking in tempdb.

Read on for the repro and check out Ned’s UserVoice bug report.

Priority Queuing In Azure SQL Data Warehouse

Matt How walks us through an improvement to Azure SQL Data Warehouse:

The concept of workload management is a key factor for Azure SQL DW as there is only limited concurrency slots available and depending on the resource class, these slots can fill up pretty quickly. Once the concurrency slots are full, queries are queued until a sufficiently sized slot is opened up. Let’s recap what Resource Classes are and how they affect workload management.

A Resource Class is a pre-configured database role that determines how much resource is allocated to queries coming from users that belong to that role. For example, an ETL service account may use a “large” resource class and be allocated a generous amount of the server, however an analyst may use a “small” resource class and therefore only use up a small amount of the server with their queries. There are actually 2 types of resource class, Dynamic and Static. The Dynamic resource classes will grant a set percentage of memory to a query and actual value of this percentage will vary as the Warehouse scales up and down. The key factor is that an xLargeRc (extra-large resource class) will always take up 70% of the Server and will not allow any other queries to be run concurrently. No matter how much you scale up the Warehouse, queries run with an xLargeRc will run one at a time. Conversely, queries run with a smallrc will only be allocated 4% of the Server and therefore as a Warehouse scales up, this 4% becomes a larger amount of resource and can therefore process data quicker.

This looks like a useful addition.  Click through for a few examples of how it will work.

Using DATEADD In SQL Server

Randolph West continues a series on date and time types in SQL Server with the DATEADD function:

As with similar functions, DATEADD can do arithmetic on dates as well as times. The syntax is straightforward:

DATEADD (datepart, number, date)

The number portion must be an integer, and it must be within the acceptable range of values for the date part.

Click through for a few examples.


November 2018
« Oct Dec »