Press "Enter" to skip to content

Curated SQL Posts

Understanding Probe Residuals

Daniel Janik explains what a probe residual is in an execution plan:

A probe residual is important because they can indicate key performance problems that might not otherwise be brought to your attention.

What is a probe residual?

Simply put, a probe residual is an extra operation that must be performed to compete the matching process. Extra being left over things to do.

Click through for an example brought about by implicit conversion.

Comments closed

The Story Of Nick

Kenneth Fisher tells the story of where the optimizer’s cost value comes from:

Obviously, it’s an important subject, right? And yet we keep seeing comments about how the cost is in seconds.

And to be fair, it is. It’s an estimate of how many seconds a query would take, if it was running on a developers workstation from back in the 90’s. At least that’s the story. In fact Dave Dustin (t) posted this interesting story today:

The best way to think of cost is as a probabilistic, ordinal, unitless value:  3 might be greater than 2; 1000 is almost certainly greater than 2; and “2 what?” is undefined.

Comments closed

Case-Insensitive Power Query Sorts

Cedric Charlier points out a comaprisonCriteria on Table.Sort in Power Query:

Have you already tried to sort a table based on a text field? The result is usually a surprise for most people. M language has a specific implementation of the sort engine for text where upper case letters are always ordered before lower case letters. It means that Z is always before a. In the example (here under), Fishing Rod is sorted before Fishing net.

The classical trick to escape from this weird behavior is to create a new column containing the upper case version of the text that will be used to sort your table, then configure the sort operation on this newly created column. This is a two steps approach (Three steps, if you take into account the need to remove the new column). Nothing bad with this except that it obfuscates the code and I hate that.

Click through to learn a more elegant way of sorting.

Comments closed

T-SQL Tuesday Roundup

Koen Verbeeck has his roundup of this month’s T-SQL Tuesday:

I asked the SQL Server community to write about their experience/opinion about the changing world we live in and how it impacts their daily job. The response was overwhelming: we had 30 participants in this months blog party!

Here’s an overview of everyone who participated. Take your time to read their stories, as they are very insightful, interesting or just plain fun to read.

There’s a lot of reading this month, with a well above-average turnout.

Comments closed

CPU Hot-Add And NUMA

Frank Denneman discusses VMware NUMA behavior when you hot-add more CPUs:

But what happens when the VM is configured with less vCPUs than the core count of the physical CPU package and CPU Hot-Add is enabled? Will there be performance impact? And the answer is no. The VPD configured for the VM fits inside a NUMA node, and thus the CPU scheduler and the NUMA scheduler optimizes memory operations. It’s all about memory locality. Let’s make use of some application workload test to determine the behavior of the VMkernel CPU scheduling.

For this test, I’ve installed DVD Store 3.0 and ran some test loads on the MS-SQL server. To determine the baseline, I’ve logged in the ESXi host via an SSH session and executed the command: sched-stats -t numa-pnode. This command shows the CPU and memory configuration of each NUMA node in the system. This screenshot shows that the system is only running the ESXi operating system. Hardly any memory is consumed. TotalMem indicates the total amount of physical memory in the NUMA node in kb. FreeMem indicates the amount of free physical memory in the NUMA node in kb.

Interesting reading.

Comments closed

Logistic Regression With R

Raghavan Madabusi runs through a sample logistic regression:

Input Variables: These variables are called as predictors or independent variables.

  • Customer Demographics (Gender and Senior citizenship)
  • Billing Information (Monthly and Annual charges, Payment method)
  • Product Services (Multiple line, Online security, Streaming TV, Streaming Movies, and so on)
  • Customer relationship variables (Tenure and Contract period)

Output Variables: These variables are called as response or dependent variables. Since the output variable (Churn value) takes the binary form as “0” or “1”, it will be categorized under classification problem in the supervised machine learning.

One of the interesting things in this post was the use of missmap, which is part of Amelia.

Comments closed

Powershell Difficulties

Dave Mason shares some difficulties he has had grokking Powershell:

The developer in me thinks this is nuts. Run the same few lines of code twice, with no changes in between, and get different outputs? Madness!

Here’s another example. Nothing too complex here: I connect to an instance of SQL, SELECT CURRENT_TIMESTAMP, and show the returned value in the output window. (There’s a fixable issue here that I would go on to discover later. But hold that thought for now.)

Even when you’re conceptually familiar with a language, getting into the particular foibles of that language can expose all sorts of behavior which is strange to newcomers.

Comments closed

Pipeline Architecture With Kafka

Alexandra Wang describes how Pandora Media has used Apache Kafka for real-time ad serving using Kafka Connect:

Our ad server publishes billions of messages per day to Kafka. We soon realized that writing a proprietary Kafka consumer able to handle that amount of data with the desired offset management logic would be non-trivial, especially when requiring exactly-once-delivery semantics. We found that the Kafka Connect API paired with the HDFS connector developed by Confluent would be perfect for our use case.

We’ve also found it painful not having a central authority on data structures that can share their respective schemas across all services and applications. Without a central registry for message schemas, data serialization and deserialization for a variety of applications are troublesome and the pipeline is fragile when schema evolution happens. We found Schema Registry is a great solution for this problem.

To address the above two problems, we integrated the Kafka Connect API and Schema Registry into our Kafka-centered data pipeline.

Well worth reading, especially the difficulties that they’ve had during maintenance periods and in lower environments.

Comments closed

Community Localization For Crossplatform Tools

Mona Nasr and Andy Gonzalez are looking for tool translation support:

Community has completed the translations for VScode SQL Server extension for six languages: Brazilian, French, Japanese, Italian, Russian, and Spanish.

We still need help with other languages. If you know anyone with language expertise, refer them to the Team Page.

Your contributions are valuable and will help us improve the product in your languages. We hope to continue working with the community in future projects.

Hit up the Team Page link to learn more about how to contribute.

Comments closed