Press "Enter" to skip to content

Author: Kevin Feasel

So You Want to Index

Erik Darling has an indexing strategy for querulous normies:

Most queries will have a where clause. I’ve seen plenty that don’t. Some of’em have surprised the people who developed them far more than they surprised me.

But let’s start there, because it’s a pretty important factor in how you design your indexes. There are all sorts of things that indexes can help, but the first thing we want indexes to do in general is help us locate data.

None of this is groundbreaking but Erik does a really good job of laying out the order in which you want to consider specific factors.

Comments closed

Polling Loops in Powershell

Aaron Nelson has one method for creating a polling loop in Powershell:

Originally I had used the Start-Sleep command to wait 3 seconds ( Start-Sleep 3
). That worked fine on my machine, but when I deployed it to the server, I found I needed to bump it up to 6 seconds. At first that worked, but then a week later I needed to bump it up to 9 seconds. The problem here is obvious, if we force it to wait 9 seconds every time, even if the task was updated after 4 second, we’re wasting extra time. And those seconds are going to add up.

Read on for a smarter approach. Ideally we’d be able to use asynchronous event handling with awaits for all of this, but the real world is not always so nice.

Comments closed

DAX and Case Sensitivity

Marco Russo and Alberto Ferrari talk about case sensitivity:

Every new language defines its own rules of case-sensitivity. R and Python are case-sensitive, DAX is not. It is not that one is right and the others are not; it is really a matter of personal taste of the author of the language. We would say that there is an equal number of pros and cons in both choices. Therefore, there is no definitive choice. That said, a choice needs to be made on two aspects: the language itself and the way it considers strings. Pascal, for example, is case-insensitive as a language, but string comparison is case-sensitive. The M language, in Power Query, is case-sensitive despite living in the same environment as DAX. DAX is case-insensitive as a formula language. 

Maybe it’s because I like living in the SQL world so much, but I highly prefer case-insensitivity as the default and case-sensitivity only when necessary.

Comments closed

Kafka in .NET

Diogo Souza walks us through building an application which produces and consumes messages using Apache Kafka:

Kafka is just the broker, the stage in which all the action takes place. The producers send messages to the world while the consumers read specific chunks of data. How do you differentiate one specific portion of data from the others? How do consumers know what data to consume? To understand this, you need a new actor in the play: the topics.

Kafka topics are the channels, the carriage that transport messages around. Kafka records produced by producers are organized and stored into topics.

This is a nice overview of Kafka followed by the basics of building a consumer and a producer in C#. I just wish that there was more community usage of Kafka so that the Confluent .NET driver would include some of the really cool stuff they’ve added to Kafka over the past couple of years.

Comments closed

The Data Mesh

James Serra explains what a data mesh is:

Its goal is to treat data as a product, with each source having its own data product manager/owner (who are part of a cross-functional team of data engineers) and being its own clearly-focused domain that has an autonomous offering, becoming the fundamental building blocks of a mesh, leading to a domain-driven distributed architecture. Note that for performance reasons, you could have a domain that aggregates data from multiple sources. Each domain should be discoverable, addressable, self-describing, secure (governed by global access control), trustworthy, and interoperable (governed by an open standard). Each domain will store its data in a data lake and in many cases will also have a copy of some of the data in a relational database (see Data Lakehouse defined for why you still want a relational database in most cases).

I’ll have to think more about this before I’m convinced. I’ll also need to think about the Aristotelian opposite of the data mesh.

Comments closed

How to Update Statistics Manually

Matthew McGiffen takes us through the process of updating statistics:

At the heart of all the methods we’ll look at is the UPDATE STATISTICS command. There are a lot of options for using this command, but we’ll just focus on the ones you’re most likely to use. For full documentation here is the official reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql

Even if you have an automated system, knowing how to update statistics is a great thing because you might need to run a one-off update to help a poorly-performing query. Or you’re using PolyBase, which doesn’t have the capability to perform statistics updates automatically because the data isn’t actually in SQL Server.

Comments closed

Building a Function to Get the Next Date by Date Name or Offset

Louis Davidson has a function for us:

As I have been building my Twitter management software, I have been doing a lot more ad-hoc, repetitive coding using T-SQL directly. When I was generating new tweets for upcoming days, a bit of the process that got old quick was getting the date for an upcoming day (the primary key for my tweet table is the date, the type of the tweet, and a sequence number). After having to pick the date of next Tuesday… I had to write some more code (because a true programmer doesn’t do repetitive work when code can be written… even if sometimes the code doesn’t save you time for days or weeks.

So this following function was born from that need, and it is something I could imagine most anyone using semi-regularly, especially when testing software. 

This is definitely fancy. My inclination would be to create a calendar table, as that’ll solve this particular issue as well as other complex variants (like, I want the next Tuesday which doesn’t fall on a holiday).

Comments closed

So You’ve Run Out of Memory

Randolph West explains how the buffer pool handles low-memory situations:

One of the bigger clichés in the data professional vocabulary (behind “it depends”) is that you always give SQL Server as much RAM as you can afford, because it’s going to use it. But what happens when SQL Server runs out of memory?

Recently a question appeared on my post about how the buffer pool works, asking the following (paraphrased):

What happens if a data page doesn’t exist in the buffer pool, and the buffer pool doesn’t have enough free space? Does the buffer pool use TempDB, [and] does TempDB put its dirty pages into the buffer pool?

This is an excellent question (thank you for asking!). I spent 30 minutes writing my reply and then figured it would make a good blog post this week if I fleshed it out a little.

Read the whole thing.

Comments closed