Press "Enter" to skip to content

Author: Kevin Feasel

Derived Table Nesting and Performance

Itzik Ben-Gan digs into some of the performance considerations around nested derived tables:

Unnesting/substitution of table expressions is a process of taking a query that involves nesting of table expressions, and as if substituting it with a query where the nested logic is eliminated. I should stress that in practice, there’s no actual process in which SQL Server converts the original query string with the nested logic to a new query string without the nesting. What actually happens is that the query parsing process produces an initial tree of logical operators closely reflecting the original query. Then, SQL Server applies transformations to this query tree, eliminating some of the unnecessary steps, collapsing multiple steps into fewer steps, and moving operators around. In its transformations, as long as certain conditions are met, SQL Server can shift things around across what were originally table expression boundaries—sometimes effectively as if eliminating the nested units. All of this in attempt to find an optimal plan.

In this article I cover both cases where such unnesting takes place, as well as unnesting inhibitors. That is, when you use certain query elements it prevent SQL Server from being able to move logical operators in the query tree, forcing it to process the operators based on the boundaries of the table expressions used in the original query.

That’s on my list for a second reading.

Comments closed

Local Variables with TOP and ORDER BY

Erik Darling points out issues with using local variables. First up is with TOP:

In case you missed it for some reason, check out this post of mine about local variables. Though it’s hard to imagine how you missed it, since it’s the single most important blog post ever written, even outside of SQL Server. It might even be more important than SQL Server. Time will tell.

While live streaming recently about paging queries, I thought that it might make an interesting post to see what happens when you use variables in places other than the where clause.

After several seconds of thinking about it, I decided that TOP would be a good enough place to muck around.

After that is ORDER BY:

I see this kind of pattern a lot in paging queries where people are doing everything in their power to avoid writing dynamic SQL for some reason.

It’s almost as if an entire internet work of SQL Server knowledge and advice doesn’t exist when they’re writing these queries.

Quite something. Quite something indeed.

I’d call out Erik’s ORDER BY examples by saying “C’mon, nobody does that!” if I hadn’t actually seen people do that…

Comments closed

Understanding LOOKUPVALUE in DAX

Alberto Ferrari explains the LOOKUPVALUE function:

LOOKUPVALUE is one of the most widely used functions, especially for DAX developers who come from an Excel background. Indeed, the behavior of LOOKUPVALUE is very close to the behavior of the widely-adopted VLOOKUP function in Excel. Yet, there are important differences between the two; quite often, newbies use LOOKUPVALUE instead of creating a relationship between tables that ensures higher flexibility and higher performance.

This article is an introduction to LOOKUPVALUE. If you are interested in studying the internals of LOOKUPVALUE in more detail, make sure to check the advanced LOOKUPVALUE article; in that article, we describe the behavior of the function in detail along with several performance considerations.

Click through for the article.

Comments closed

Kubernetes, SQL Server, and Kerberos

Raul Gonzalez walks us through one problem with configuring SQL Server to run in an Availability Group over Kubernetes:

The problem of Kerberos is that is not easy to configure and multiple times results in the well known Anonymous Logon Error, aka Double Hop.

That’s why you will find plenty of IIS and other applications out there, using SQL logins (Impersonation Users), because Windows Authentication can be really frustrating and applications won’t be able to connect to SQL Server otherwise.

There are multiple resources in the internet that explain the Double-Hop issue, so that won’t be the scope of this post, but I will show how to correctly configure SPNs to SQL Server Availability Groups, which is the first link in the Kerberos chain.

Kubernetes isn’t the only place where you’ll find the need to set SPNs, either.

Comments closed

What MAXDOP Controls

Pedro Lopes gives us an explanation of what MAXDOP really does for us:

There are plenty of blogs on these topics, and the official documentation does a good job of explaining these (in my opinion). If you want to know more about the guidelines and ways to override for specific queries, refer to the Recommendations section in the Configure the max degree of parallelism Server Configuration Option documentation page.

But what does MAXDOP control? A common understanding is that it controls the number of CPUs that can be used by a query – previous revisions of the documentation used this abstraction. And while that is a correct abstraction, it’s not exactly accurate.

This is definitely a nice companion piece to Paul White’s article on how MAXDOP works.

Comments closed

Drill-Down Tables in Cube.js

Artyom Keydunov shows off drill-down tables in Cube.js:

Since the release of drill down support in version 0.19.23, you can build interfaces to let users dive deeper into visualizations and data tables. The common use case for this feature is to let users click on a spike on the chart to find out what caused it, or to inspect a particular step of the funnel — who has converted and who has not.

In this blog post, I’ll show you how to define drill downs in the data schema and build an interface to let users explore the underlying chart’s data. If you’re just starting with Cube.js, I highly recommend beginning with this Cube.js 101 tutorial and then coming back here. Also, if you have any questions, don’t hesitate to ask them in our Slack community.

Click through for the demo, as well as links to the source code and an online example.

Comments closed

Comparing Koalas to PySpark

Tori Tompkins gives us an understanding of where Koalas fits in the Spark world:

One significant difference between Spark’s implementation of Dataframes and pandas is its immutability.

With Spark dataframes, you are unable to make changes to the existing object but rather create a brand new dataframe based on the old one. Pandas dataframes, however, allow you to edit the object in place. With Koalas, whilst still spark Dataframes under the hood, have kept the mutable syntax of pandas.

It does this by introducing this concept of an ‘Internal Frame’. This holds the spark immutable dataframe and manages the mapping between the Koalas column names and Spark column names. It also manages the Koalas index names to spark column name to replicate the index functionality in pandas (covered below). It acts as a bridge between Spark and Koalas by mimicking the pandas API with Spark. This Internal Frame replicates the mutable functionality of pandas by creating copies of the internal frame but appearing to be mutable.

Read the whole thing.

Comments closed

Comparing JSON Documents with SQL Server

Phil Factor has a new function for us to try out:

It is often necessary to compare objects, particularly when you are working on a routine that generates an object and you want to check that an improvement you’ve made is actually returned the same result. An XML or JSON document is the nearest we get to an object in SQL Server. This is probably as near as you’d want.

Although this routine is designed to compare JSON documents, and takes them as input, it can be used for comparing any results.

The style takes from the diff command in Unix.

Comments closed

Capturing the T-SQL of Prepared Statements

Grant Fritchey shows us how to use Extended Events to capture the T-SQL of a prepared statement:

Why would I be concerned with prepared statements? Wouldn’t sql_batch_completed and rpc_completed cover us? Well, no.

What happens when you use sp_prepare? What happens when you’re using an ORM tool that’s using prepared statements? You may see queries that look like this:

EXEC sp_execute 5, 48766;

What the heck code is that executing? Let’s find out.

Read on for the code. It’s not the type of thing you want running 24/7 against all of your prepared statements (certainly not on a busy system!), but very useful for troubleshooting when you don’t have easy access to that next level up in the application chain.

Comments closed

Delaying Loops in Python

Jon Fletcher gives us three methods to delay a Python loop:

In this blog post, I will run through 3 different ways to execute a delayed Python loop. In these examples, we will aim to run the loop once every minute.
To show the delay, we will print out the current datetime using the datetime module.

I think I’d be a bit concerned about all three potentially failing—option 1 can get out of sync if it takes more than 1 second to process inside the loop, option 2 might miss a minute, and option 3 can also sometimes miss. But it’s interesting to think about.

Comments closed