2021-07-28 – Curated SQL

Inferring Median from a Few Values

Published 2021-07-28 by Kevin Feasel

Holger von Jouanne-Diedrich is stuck in the middle with you:

Let us dive directly into the matter, the Small Data Rule states:
In a sample of five numerical values from any unknown population, the median of this population lies between the smallest and the largest sample value with 94 percent certainty.
The “population” can be anything, like data about age in a population, income in a country, television consumption, donation amounts, body sizes, temperatures and so on.

This is a very interesting concept. Five values won’t give you the median, but it will give you a bounded expectation with high likelihood. And check out the comments: adding a few more data points increases the expected likelihood even further.

1 Comment

Explaining Chatbots

Published 2021-07-28 by Kevin Feasel

Will Harris takes us through the basics of chatbots:

You can develop your business processes into conversational workflows to help people perform tasks. This can be wide ranging; from looking up records to do with their accounts, through to engaging in new services. There are many processes that can be turned into an effective conversational workflow. This typically helps people perform activities more inclusively and conveniently or helps reduce grunt work for employees.
Brisa2, an automotive company, developed a bot to help find company data and perform tasks like password resets, helping to free up the IT team for other tasks.

Click through for an overview of the concept. I’ve been down on this generation of chatbot because, as a user, they usually show up in places where a well-designed UI would be faster and more effective—as well as less prone to failures in understanding.

Comments closed

Tips for Decreasing the Impact of Rebalancing in Kafka Streams

Published 2021-07-28 by Kevin Feasel

Vasyl Sarzhynskyi has some techniques to make rebalancing in Kafka less of a big deal:

Kafka Rebalance happens when a new consumer is either added (joined) into the consumer group or removed (left). It becomes dramatic during application service deployment rollout, as multiple instances restarted at the same time, and rebalance latency significantly increasing. During rebalance, consumers stop processing messages for some period of time, and, as a result, processing of events from a topic happens with some delay. Some business cases could tolerate rebalancing, meanwhile, others require real-time event processing and it’s painful to have delays in more than a few seconds. Here we will try to figure out how to decrease rebalance for Kafka-Streams clients (even though some tips will be useful for other Kafka consumer clients as well).

Read on for an example of the problem, as well as several practical tips for mitigating the issue.

Comments closed

When DBCC_OBJECT_METADATA becomes a Bottleneck

Published 2021-07-28 by Kevin Feasel

Paul Randal takes us through another latch:

Continuing my series of articles on latches, this time I’m going to discuss the DBCC_OBJECT_METADATA latch and show how it can be a major bottleneck for consistency checks prior to SQL Server 2016 under certain circumstances. The issue affects DBCC CHECKDB, DBCC CHECKTABLE, and DBCC CHECKFILEGROUP, but for clarity I’ll just reference DBCC CHECKDB for the rest of this post.
You might wonder why I’m writing about an issue that affects older versions, but there are still a huge number of SQL Server 2014 and older instances out there, so it’s a valid topic for my series.

Read on to understand what DBCC_OBJECT_METADATA does and how it can become a bottleneck on those older versions of SQL Server.

Comments closed

Use Cases for Extended Properties

Published 2021-07-28 by Kevin Feasel

Eitan Blumin has a few non-standard use cases for extended properties:

Let’s say that you’re using the CHANGE TRACKING feature in SQL Server for the purpose of synchronizing one or more tables from one database to another.
For this purpose, you would have to keep track of the value of CHANGE_TRACKING_CURRENT_VERSION() when you last synchronized your data, so that you’d know which value you should use with the CHANGETABLE function during the next synchronization.
Most DBAs would think of creating a dedicated table to manage this synchronization per each table.
But why should you, when you can simply use extended properties for this purpose?

I’ve used extended properties primarily for documentation, so it’s interesting to see a couple of use cases which are definitively not about documenting objects.

Comments closed

Finding the Culprit in a Database Timeout

Published 2021-07-28 by Kevin Feasel

David Fowler sets us straight on query timeouts:

Not very helpful, right? And unless you’ve got some sort of monitoring in place, for example you can find these in SQL Sentry by looking for aborted queries, or you’re picking these up in your applicaiton logging, you’re going to have a hard time tracking down what query actually tripped the timeout. As you’re reading this post, I’m going to assume that you haven’t got monitoring.
So how do we go about figuring this out?

Click through for the answer.

Comments closed

Differences in Power BI PPU Models vs Azure Analysis Services

Published 2021-07-28 by Kevin Feasel

Gilbert Quevauvilliers continues a series on migrating from Azure Analysis Services to Power BI Premium Per User:

When creating a new dataset in AAS this is typically done with Visual Studio, there are a lot of similar items between Visual Studio and Power BI desktop, the biggest difference is how you access them are very different as well as their look and feel.
When starting with a new dataset the process is to create a new data source, import some data and then to create the tables.

Click through for the differences. This is one area where Power BI drags in comparison to Azure Analysis Services.

Comments closed

Azure Test Plan Terminology

Published 2021-07-28 by Kevin Feasel

Kevin Chant is here with a language lesson:

In this post I want to cover some Azure Test Plans jargon for Data Platform professionals. Because I understand it can be confusing.
In addition, I did say I would explain some jargon in my last post about using Azure Test Plans for Data Platform deployments. Of course, these explanations will help with other kinds of deployments as well as Data Platform ones.
By the end of this post, you will have a better understanding of some of the jargon involved in Azure Test Plans. Plus, a good recommendation of a lab to use.

Click through for that depiction.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Day: July 28, 2021