Press "Enter" to skip to content

Curated SQL Posts

Exactly-Once Semantics in Kafka

Boyang Chen and Bob Barrett look at some changes to exactly-once semantics in Apache Kafka:

When using EOS, the producer and broker both have logic to determine whether it is safe for a producer to continue to send data without violating the exactly-once guarantees. Prior to Kafka 2.5, if either the producer or broker was ever not able to make this determination, the producer would enter a fatal error state. The only way to continue processing was to close the producer and create a new one. This process is generally very disruptive to client applications. For example, if a producer fails in Kafka Streams, then the associated task needs to be migrated, which causes a rebalance of the full workload. This results in throughput drop until the rebalance is complete.

To address this issue, KIP-360 added a mechanism for producers to automatically recover when they encounter these cases and continue processing. To better understand how it works, the following describes some of the situations that can cause fatal errors.

There have been several improvements to the process. Though to be honest, when I hear someone mention exactly-once in a distributed system, it sets off my spidey senses.

Comments closed

Enforcing Database Practices with Policy-Based Management

Nisarg Upadhyay hits on one of my favorite features Microsoft doesn’t care about anymore:

In this article, I am going to explain how we can enforce the SQL database best practices using Policy-Based Management. The policy-based management feature of SQL Server was introduced in SQL Server 2005. This feature was useful because it helps database administrators to define and enforce the database policies based on the organizations’ requirements.

Back when I was a DBA, I really appreciated Policy-Based Management. Unfortunately, although my job has changed a few times since then, Policy-Based Management hasn’t.

Comments closed

Calculating Cloud App Availability

Dave Bermingham gives you a way to calculate how available you should expect your application to be given SLAs:

When deploying business critical applications in the cloud you want to make sure they are highly available. The good news is that if you plan properly, you can achieve 99.99% (4-nines) of availability or more. However, calculating your true availability may not be as straightforward as it seems.

When considering availability you must consider the key components that make access to your application possible, which I’ll call the availability chain. Component of the availability chain are:

– Compute
– Network 
– Storage
– Application
– Dependent services

Your application is only as available as your weakest link, and your downtime increases exponentially with each additional link you add to the chain.  Let’s examine each of the links. 

Read on for a breakdown of these items.

Comments closed

Bug with Halloween Protection and the OUTPUT Clause

Paul White writes up a bug report:

Looking at the execution plan, it is hard to see how deleting a row (at the Clustered Index Delete) then inserting it again (at the Clustered Index Insert) could possibly result in a duplicate key in the index. Remember there is only one row, one column, and one index.

Logically, the only way this error can occur is if the Delete operator does not delete the row.

Read the whole thing. It’s probably not something you’ll ever come across yourself, hopefully.

Comments closed

Building a Windows Terminal Theme

Chrissy LeMaire has a nice-looking Windows Terminal theme:

I was sooo excited when Microsoft first made the announcement but was disappointed when I found out I’d have to run a specific version of Windows and compile the app myself. Whaat? No way, too much work. Now, it’s more widely available, so I decided to jump in and try it out. I love it and even miss Windows Terminal when I develop PowerShell on my Mac.

So here’s the Theme I’m contributing, which is based off of my favorite VS Code Theme, 1984 Unbolded. I call it Retrowave.

I went with essentially a black-and-grey theme for cmd + PowerShell and a bit more color (but still grey background) for WSL, but that’s so I can create screenshots easily without having to worry about color contrast on the printed page.

Comments closed

T-SQL Tuesday Roundup: Incident Reports

Kerry Tyler rounds up the usual suspects:

Hello T-SQL Tuesday Readers! I’m sorry for being really late in getting this post out this week.

So! A couple of weeks ago, for this month’s topic, I asked everyone to post about something that broke or went wrong, and what it took to fix it. Last week, fourteen of you responded with your stories of woe so we could all learn from your incidents and recoveries in a constructive way, like pilots do. Here’s the recap of those posts, in the order that they came in.

Read on for a summary of all of the entrants for this month.

Comments closed

Using Dynamic Datasets in Power BI

Jose Mendes implements the screening pattern in Power BI:

I recently came across the need to build a screening pattern based on the Kimball “Screening” concept. One of the desired outputs was a Power BI report that allowed a data steward to easily identify the failed screen, drill down to the detail and show the row(s) rejected by the data quality rule. To achieve this goal, I had to mimic in Power BI an SSRS functionality called dynamic dataset, which allowed me to, using a single matrix, show different source columns based on a selected screen. Feeling curious already? Then, let’s dive into the details.

A screen is designed to operate on a single input file or database table and contains the data quality condition to check. For example, there could be a Missing Customer Postcode screen which would test for any customers who are missing a postcode.

Read on for an implementation.

Comments closed

Optimizing a Poisson Survival Model

Joshua Entrop shows off optimx() in R to perform a survival analysis:

In this blog post, we will fit a Poisson regression model by maximising its likelihood function using optimx() in R. As an example we will use the lung cancer data set included in the {survival} package. The data set includes information on 228 lung cancer patients from the North Central Cancer Treatment Group (NCCTG). Specifically, we will estimate the survival of lung cancer patients by sex and age using a simple Poisson regression model. You can download the code that I will use throughout post here

Read the whole thing. H/T R-bloggers

Comments closed

Apache Kafka in the Gaming Industry

Kai Wähner walks us through a few use cases for Apache Kafka in online gaming:

This blog post explores how event streaming with Apache Kafka provides a scalable, reliable, and efficient infrastructure to make gamers happy and Gaming companies successful. Various use cases and architectures in the gaming industry are discussed, including online and mobile games, betting, gambling, and video streaming.

Learn about:

– Real-time analytics and data correlation of game telemetry
– Monetization network for real-time advertising and in-app purchases
– Payment engine for betting
– Detection of financial fraud and cheating
– Chat function in games and cross-games
– Monitor the results of live operations like weekend events or limited time offers
– Real-time analytics on metadata and chat data for marketing campaigns

It’s an interesting overview of where this platform fits in the industry.

Comments closed

Why IS NOT NULL is a Range Predicate

Erik Darling drops knowledge on us:

Why is IS NULL (not to be confused with ISNULL, the function) considered in equality predicate, and IS NOT NULL considered an inequality (or range) predicate?

It seems like they should be fairly equivalent, though opposite. One tests for a lack of values, and one tests for the presence of values, with no further examination of what those values are.

The trickier thing is that we can seek to either condition, but what happens next WILL SHOCK YOU.

This is my shocked face.

Comments closed