Press "Enter" to skip to content

Month: February 2021

Sorting Pre-Sorted Data

Daniel Hutmacher has an idea:

Whenever SQL Server needs to sort a data stream, it will use the Sort operator to reorder the rows of the stream. Sorting data is an expensive operation because it entails loading part or all of the data into memory and shifting that data back and forth a couple of times. The only time SQL Server doesn’t sort the data is when it already knows the data to be ordered correctly, like when it has already passed a Sort operator or it’s reading from an appropriately sorted index.

But what happens if the data is ordered correctly, but SQL Server doesn’t know about it? Let’s find out.

Click through for the answer.

Comments closed

Filtering by Cluster in Power BI

Joseph Yeates starts a two-parter:

This is a technique that I have used in reports that analyze a feature in a data set that is at the bottom of a hierarchy. For example, customer that belong to a larger customer segment or accounts that belong to a grouping. The report analyzes information for an individual customer or account; however, we want to bring in some comparisons for other customers or accounts that belong to the same segment, grouping or cluster.

Click through for the technique and stay tuned for part two.

Comments closed

Polychoric Correlation in Practice

Jack Davis explains the concept of polychoric correlation:

In polychoric correlation, we don’t need to know or specify where the boundary between “good” and “very good” is, just that it exists. The distribution of the ordinal responses, along with the assumption that the latent values follow a normal distribution, is enough that the polychor() function in the polycor R package can do that for us. In most practical cases, you don’t even need to know where the cutoffs are, but they are useful for demonstration that the method works.

Polychoric correlation estimates the correlation between such latent variables as if you actually knew what those values were. In the examples given, we start with the latent variables and use cutoffs to set them into bins, and then use polychoric on the artificially binned data. In any practical use case, the latent data would be invisible to you, and the cutoffs would be determined by whoever designed the survey.

Read on for a demonstration of the process in R.

Comments closed

Adding Data Labels in Excel

Elizabeth Ricks shows how to include data labels on a line chart in Excel:

Today’s post is a tactical one for folks creating visuals in Excel: how to embed labels for your data series in your graphs, instead of relying on default Excel legends.

To illustrate, let’s look at an example from storytelling with data: Let’s Practice!. The graph below shows demand and capacity (in project hours) over time.

It is significantly more complicated than you might first think.

Comments closed

Power Query Design the Right Way

Paul Turley continues a series:

Life is full of choices and trade-off decisions. Let’s say that you need to create a lookup table containing sales regions and this information doesn’t exist in the source database. You could easily create a new table in Power Query using the “Enter Data” feature, and just manually enter the sales regions. This would solve the immediate problem with very little effort, but how will the table be maintained if sales regions are added or change in the future? We could keep the list of values in an Excel file stored in SharePoint for a business owner to maintain when information changes. You could also go all the back back to the beginning of the process and load a dimension table in the source database. IT professionals might take the hardline and say that all data must flow through the data warehouse regardless of the effort and cost. Which of these is the right choice? …it depends on the scope reporting project, and the long-term ownership of the solution. Sometimes quick and convenient are OK but particularly tables that need to scale and handle larger data volumes in the future, the following guidelines are critical.

Read on for Paul’s recommended practices.

Comments closed

Installing SQL Server on an Azure VM

Niels Berglund takes us through the steps of creating an Azure VM running SQL Server:

A while ago, I wanted to do a quick test on a new SQL installation, and I wanted the SQL installation to be on a “pristine” server. I was not keen on creating a new virtual machine on my local dev-box, as for that I would need to create a VM image etc., and it seemed like too much hassle for a lazy person like me. The obvious choice then is to do it in the cloud. How hard can that be, what could possibly go wrong?!

It turned out to not be as straight-forward as I thought it would be, but eventually, I managed to get it right. Since I probably need to do it again some time, I thought I’d write a post about it, so I have something to go back to. So here we go …

Niels goes through this in meticulous detail, as is the norm.

Comments closed

Availability Groups and the Shakes

Niko Neugebauer coins a term:

Disclaimer: I am using the word shake by my own initiative and no Microsoft Documentation ever to my knowledge ever mentioned that situation. Those shakes are represented most of the time as health events to the cluster, such as the Lease Timeout resulting in a sudden attempt of Failover.
Why did I choose that word ? I don’t know. Honestly. 🙂

Read on to see it in context around hosts, CPU, and especially I/O.

Comments closed

Using containerd as a Kubernetes Container Runtime

Anthony Nocentino does a switcheroo:

In this post, I’m going to show you how to install containerd as the container runtime in a Kubernetes cluster. I will also cover setting the cgroup driver for containerd to systemd which is the preferred cgroup driver for Kubernetes. In Kubernetes version 1.20 Docker was deprecated and will be removed after 1.22. containerd is a CRI compatible container runtime and is one of the supported options you have as a container runtime in Kubernetes in this post Docker Kubernetes world. I do want to call out that you can use containers created with Docker in containerd.

There are a few steps, but it’s not crazy by any stretch, and that’s part of why the post-Docker-in-Kubernetes world won’t be chaos.

Comments closed

Using the Pipe Operator in ggplot2

Tomaz Kastrun reduces the number of pipe-like operators:

Using pipe %>% or chaining commands is very intuitive for creating following set of commands for data preparation. Visualization library ggplot in this manner uses sign “+” (plus) to do all the chaining. What if we would have to replace with the pipe sign?

This is because ggplot was developed prior to magrittr took over the piping world in R, so there wasn’t a “normal” pipe. I had been hopeful that ggvis would take over, as it does use the %>% pipe, but that project has gone dormant.

Comments closed