2022-06-03 – Curated SQL

Creating Reproducible Examples with CI

Published 2022-06-03 by Kevin Feasel

Colin Gillespie and Jack Walton tackle a common training problem:

As the number of courses we offer increased, so did the maintenance burden of our associated training materials (lecture notes, slides, exercises, and more). To ease this burden, and to assist in ensuring that our training materials build consistently, we developed an R package called {jrNotes2}. Amongst other things, this package ensures that all courses:
– have identical “template files”: .gitlab-ci.yml, .gitignore, Makefiles, index.Rmd, …;
– have the same directory structure, and
– pass a set of quality-assurance checks.

This is smart but read on to see why it’s still a challenge. This is especially true in the R and Python worlds, where breaking changes seem to be so common.

Comments closed

A Transaction Log in Apache Flink

Published 2022-06-03 by Kevin Feasel

Roman Khachatryan and Yuan Mei deal with transaction log issues:

State backends don’t start any snapshotting work until the task receives at least one checkpoint barrier, increasing the effective checkpoint duration. This is suboptimal if the upload time is comparable to the checkpoint interval; instead, a snapshot could be uploaded continuously throughout the interval.
This work discusses the mechanism introduced in Flink 1.15 to address the above cases by continuously persisting state changes on non-volatile storage while performing materialization in the background. The basic idea is described in the following section, and then important implementation details are highlighted. Subsequent sections discuss benchmarking results, limitations, and future work.

Read on to see what they did.

Comments closed

Alternatives to the Dead Letter Queue in Apache Kafka

Published 2022-06-03 by Kevin Feasel

Kai Waehner can’t return to sender:

This article focuses on the data streaming platform Apache Kafka. The main reason for putting a message into a DLQ in Kafka is usually a bad message format or invalid/missing message content. For instance, an application error occurs if a value is expected to be an Integer, but the producer sends a String. In more dynamic environments, a “Topic does not exist” exception might be another error why the message cannot be delivered.
Therefore, as so often, don’t use the knowledge from your existing middleware experience. Message Queue middleware, such as JMS-compliant IBM MQ, TIBCO EMS, or RabbitMQ, works differently than a distributed commit log like Kafka. A DLQ in a message queue is used in message queuing systems for many other reasons that do not map one-to-one to Kafka. For instance, the message in an MQ system expires because of per-message TTL (time to live).
Hence, the main reason for putting messages into a DLQ in Kafka is a bad message format or invalid/missing message content.

Read on to learn the Kafka-based approach to dealing with bad messages rather than using a Dead Letter Queue.

Comments closed

Understanding Missing Index Impact

Published 2022-06-03 by Kevin Feasel

Erik Darling delves into the depths of missing indexes:

Breaking each of those down, the only one that has a concrete meaning is Uses, but that of course doesn’t mean that a query took a long time or is even terribly inefficient.
That leaves us with Average Query Cost, which is the sum of each operator’s estimated cost in the query plan, and Impact.
But where does Impact come from?

Read on to learn where, as well as why you shouldn’t blindly trust that number.

Comments closed

T-SQL Language Enhancements in SQL Server 2022

Published 2022-06-03 by Kevin Feasel

Chad Baldwin checks out what’s new:

I’ve been exicted to play around with some of the new features and language enhancements that are available in SQL Server 2022 so I’ve been keeping an eye on the Microsoft Docker repository for a new 2022 image. Well, they finally added it to Docker Hub! I immediately pulled the image and started playing with it.
I want to focus on the language enhancements as those are the easiest to demonstrate, and I feel that’s what you’ll be able to take advantage of the quickest after upgrading.

Read on for a dozen or so language enhancements. This isn’t as big a change as what 2012 brought but there is a lot of useful stuff in here, as well as more that has been publicly announced like APPROX_PERCENTILE_CONT() (and _DISC(), yeah, but bah humbug).

Comments closed

SQL Server 2022 Public Preview on Linux

Published 2022-06-03 by Kevin Feasel

Amit Khandelwal has notes on SQL Server 2022 on Linux:

In continuation of last week’s announcement of SQL Server 2022 public preview, we are pleased to announce availability of SQL Server 2022 on Linux/Containers for public preview. Here are the details for getting started with the SQL Server 2022 public preview packages on Linux/Containers.

As usual, the officially supported distributions are Red Hat Enterprise Linux and Ubuntu.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Day: June 3, 2022

Creating Reproducible Examples with CI

A Transaction Log in Apache Flink

Alternatives to the Dead Letter Queue in Apache Kafka

Understanding Missing Index Impact

T-SQL Language Enhancements in SQL Server 2022

SQL Server 2022 Public Preview on Linux