Press "Enter" to skip to content

Author: Kevin Feasel

Sticky Partitioning in Kafka 2.4

Justine Olshan takes us through sticky partitioning in Kafka 2.4:

The sticky partitioner addresses the problem of spreading out records without keys into smaller batches by picking a single partition to send all non-keyed records. Once the batch at that partition is filled or otherwise completed, the sticky partitioner randomly chooses and “sticks” to a new partition. That way, over a larger period of time, records are about evenly distributed among all the partitions while getting the added benefit of larger batch sizes.

It looks like this is an improvement with few downside tradeoffs.

Comments closed

Azure Data Factory Templates and Source Control

Cathrine Wilhelmsen continues a series on Azure Data Factory. First up is source control:

And yeah, I usually recommend that you set up source control early in your project, and not on day 18… However, it does require some external configuration, and in this series I wanted to get through the Azure Data Factory basics first. But by now, you should know enough to decide whether or not to commit to Azure Data Factory as your data integration tool of choice.

Next up is using the template gallery:

You can also create custom templates and share them with your team – or share them externally with others. Custom templates are saved in your code repository and will show up in the template gallery for you and your team. If you want to share them externally, you can easily export them, so others can import them in their Azure Data Factory.

Let’s take a look!

Read on to learn more.

Comments closed

No-Longer-Necessary Trace Flags

Monica Rathbun points out some of the trace flags which are no longer important in SQL Server:

If you have ever attended one of my performance tuning sessions, you know I tend to talk about  trace flags.  Trace Flags can help fix performance issues and some are now defaulted in later SQL Server versions. In my opinion, when a trace flag’s behavior defaulted in a version, then you should potentially put them in place within environments that do not have them implemented. Below, are a few of these particular traces flag along with Microsoft’s definition of what each trace flag does, taken straight from MS documents.  I have also included a brief commentary on each one.  As with any change, you should be sure to thoroughly test before implementing these trace flags into any production environment.

Read the whole thing, especially because at least one of them is still optional and defaulted to off (but able to change at a different scope).

Comments closed

SQL Server Truncating Numbers to Asterisks

Bert Wagner points out that some numeric types handle overflow in a weird way:

Why does SQL Server sometimes error when converting a number into a string, but other times succeeds and returns an asterisk?

I don’t know.

The best (and logical) answer I could find online is from Robert Sheldon, who attributes it to poor error handling practices, “…before error handling got a more reputable foothold.”

This makes it important to check your results. I imagine that there’s somebody who relies upon this exact functionality, but it’s pretty weird.

Comments closed

The Fickleness of Batch Mode on Rowstore

Erik Darling points out how difficult it can sometimes be to get batch mode processing on rowstore tables:

I’m excited about this feature. I’m not being negative, here. I just want you, dear reader, to have reasonable expectations about it.

This isn’t a post about it making a query slower, but I do have some demos of that happening. I want to show you an example of it not kicking in when it probably should. I’m going to use an Extended Events session that I first read about on Dmitry Pilugin’s blog here. It’ll look something like this.

Read on for a demonstration of the point.

Comments closed

TRY_PARSE and NaN

Slava Murygin finds a nasty bug in SQL Server:

Database in trouble has a table with FLOAT column. It’s Front-End application verifies user’s input and inserts the data into that column using TRY_PARSE function.
The developer’s intention was that any “Not-a-Numeric” or “Out-of-Range” values will be automatically converted to NULL and it will be for user’s discretion to verify and fix these values.

However, one of the application users was very educated and instead of empty space, NULL or any other bad not numeric value the user supplied data with value of “NaN” for empty cells, which simply stands for “Not a Numeric”.
That action caused a database corruption!

Click through for a demo which you should not repeat on a work server.

Comments closed

Functional Java

Rishi Khandelwal lays out imperative versus functional Java with several examples:

As a java developer, you must have confused, whether should I move to the functional programming paradigm? What are the benefits it provide to us? People are talking about it everywhere. So let’s give it a try once and then you can decide whether you should go to the functional paradigm or not.

We will see the functional programming features one by one with the code examples and will compare it with the imperative way of java programming.

The snide part of me says “Hey, look, Java’s almost caught up to C# 3.0!” But that’s pushing it a little far. I think these functional pieces improve the language similarly to how they did C#, but if I were a regular Java developer, I’d probably look to Scala or Kotlin instead (says the guy who won’t shut up already about F#).

Comments closed

DBLog: CDC for MySQL and Postgres

Andreas Andreakis and Ioannis Papapanagiotou announce a new change data capture tool for open source databases:

In databases like MySQL and PostgreSQL, transaction logs are the source of CDC events. As transaction logs typically have limited retention, they aren’t guaranteed to contain the full history of changes. Therefore, dumps are needed to capture the full state of a source. There are several open source CDC projects, often using the same underlying libraries, database APIs, and protocols. Nonetheless, we found a number of limitations that could not satisfy our requirements e.g. stalling the processing of log events until a dump is complete, missing ability to trigger dumps on demand, or implementations that block write traffic by using table locks.

This motivated the development of DBLog, which offers log and dump processing under a generic framework. In order to be supported, a database is required to fulfill a set of features that are commonly available in systems like MySQL, PostgreSQL, MariaDB, and others.

It looks like DBLog is not open source just yet, but that’s forthcoming.

Comments closed

Changing Synapse Analytics Resource Classes with Azure Automation

John McCormack wins today’s edition of Cloud Bingo:

I was a approached by a data warehouse developer who wanted to be able to take advantage of the feature in Azure Synapse Analytics (Formerly Azure SQL Data Warehouse) called ‘Resource classes’. Resource classes allow users in certain roles to have access to extra compute resources for their queries. This can increase their max memory per query (although there is an overall trade-off on concurrency). Unfortunately, due to this being managed via roles, the person looking to change resource classes needs to be in the db_owner role so they can manage database role membership. As I didn’t want to add the developer to the db_owner role, a different solution was required.

John gives us a nice answer to this problem. Click through for the script.

Comments closed