Press "Enter" to skip to content

Curated SQL Posts

Debugging DAX via Tooltips

Matt Allington has a workaround for us:

DAX is a tricky language; on the surface it is deceptively simple, but under the hood it can quickly become complex and it can take many years to master. If you have ever typed a formula and crossed your fingers when you press Enter, then you know what I mean. If you are reading this article, you are no doubt already on your own DAX learning journey.

As you become more competent at DAX, you will start to write more complex formulas that behave differently depending on the filters in your visuals.  One such example is the P&L report that I shared in 2020.  This report contains quite a tricky formula. It was built slowly and methodically, one step at a time, until it was working as desired.  This is the best way to write DAX.  If you want to see how I do it, then go back to the article linked above and watch the video.  While I recommend this as the best approach to writing formulas, the reality is that sometimes you will need to go back to a formula you wrote previously and debug that formula.  Maybe it is not working correctly, or maybe you need to enhance it for some reason.  Whatever the reason, today I am sharing with you my technique to debug complex formulas using tool tips.

Read the whole thing.

Comments closed

Faster String Concatenation with SQL Server

Steve Stedman has some tips for people who need to combine strings:

What the previous show as that the longer the string gets the slower the concatenation is. So instead we declare a second VARCHAR(MAX) variable called @stringBuilder, and each time through the loop we concatenate to that, then every thousandth time through the loop we take the @stringBuilder variable and concatenate it on to the @bigString, then clear out the @stringBuilder variable. This keeps the @stringBuilder variable relatively short, and reduces the number of concatenations to the @bigString to roughly 1/1000th the original.

Click through for a demo of the process. I don’t think I’ve been in too many situations where string concatenation was a performance killer in SQL Server, but it’s good to know.

Comments closed

App Locks and Read Committed Snapshot Isolation

Michael J Swart has a tip for those who have RCSI turned on and are using app locks:

The procedure sp_getapplock is a system stored procedure that can be helpful when developing SQL for concurrency. It takes a lock on an imaginary resource and it can be used to avoid race conditions.

But I don’t use sp_getapplock a lot. I almost always depend on SQL Server’s normal locking of resources (like tables, indexes, rows etc…). But I might consider it for complicated situations (like managing sort order in a hierarchy using a table with many different indexes).

Click through to see how it normally works and how you should switch things up if you’re using Read Committed Snapshot Isolation.

Comments closed

Using the Model Database

Garry Bargsley explains the utility of the model database:

Below I will outline the model database, which has a unique purpose for SQL Server.

The model database is an empty shell of a database that has the sole purpose of providing a database template for any new User Database created in SQL Server. With that being said, you can make configuration changes to the model database and then expect those baseline settings to be applied to all new user databases.

Read on to see some of the things you can set. I was a bit disappointed in some of the things which model doesn’t apply across to other databases, but it does carry over quite a bit.

Comments closed

Living in the Lakehouse

James Serra defines the term “data lakehouse”:

As a follow-up to my blog Data Lakehouse & Synapse, I wanted to talk about the various definitions I am seeing about what a data lakehouse is, including a recent paper by Databricks.

Databricks uses the term “Lakehouse” in their paper (see Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics), which argues that the data warehouse architecture as we know it today will wither in the coming years and be replaced by a new architectural pattern, the Lakehouse. Instead of the two-tier data lake + relational data warehouse model, you will just need a data lake, which is made possible by implementing data warehousing functionality over open data lake file formats.

While I agree there may be some uses cases where technical designs may allow Lakehouse systems to completely replace relational data warehouses, I believe those use cases are much more limited than this paper suggests.

James is a sharp and perceptive fellow, so read the whole thing.

Comments closed

Power BI: New Features for Data Analysts

Tomaz Kastrun looks at some new functionality in Power BI which might interest data analysts:

Small multiples is a layout of small charts over a grouping variable, aligned side-by-side, sharing common scale, that is scaled to fit all the values (by grouping or categorical variable) on multiple smaller graphs. Analyst should immediately see and tell the difference between the grouping variable (e.g.: city, color, type,…) give a visualized data.

In Python, we know this as trellis plot or FacetGrid (seaborn) or simply subplots (Matplotlib).

In R, this is usually referred to as facets (ggplot2).

Read on for an example of this, as well as two other features, as well as how you might have worked with these ideas in Python and R.

Comments closed

Estimating Row Counts without Statistics

Matthew McGiffen dives into rules of thumb:

I find this is a question that comes up again and agan. What estimate for the number of rows returned does SQL Server use if you’re selecting from a column where there are no statistics available?

There are a few different algorithms used depending on how you’re querying the table. In this post we’ll look at where we have a predicate looking for a fixed value.

Read on for a few examples, noting that this specifically relates to tables and not things like table-valued parameters.

Comments closed