Press "Enter" to skip to content

Curated SQL Posts

Locking Issue with Columnstore Indexes

Joe Obbish troubleshoots an issue on tables with columnstore indexes:

I recently ran into a production issue where a SELECT query that referenced a NOLOCK-hinted table was hitting a 30 second query timeout. Query store wait stats suggested that the issue was blocking on a table with a nonclustered columnstore index (NCCI). This was quite unexpected to me and I was eventually able to produce a reproduction of the issue. I believe this to be a bug in SQL Server that’s present in both RTM and the current CU as of this blog post (CU14). The issue also impacts CCIs as well but I did significantly less testing with that index type.

Read on for the issue, how you can replicate it, and a couple ways to work around it.

Comments closed

Why the Optimizer Doesn’t Look at Buffer Pool Data

Paul Randal has an explanation for us:

SQL Server has a cost-based optimizer that uses knowledge about the various tables involved in a query to produce what it decides is the most optimal plan in the time available to it during compilation. This knowledge includes whatever indexes exist and their sizes and whatever column statistics exist. Part of what goes into finding the optimal query plan is trying to minimize the number of physical reads needed during plan execution.

One thing I’ve been asked a few times is why the optimizer doesn’t consider what’s in the SQL Server buffer pool when compiling a query plan, as surely that could make a query execute faster. In this post, I’ll explain why.

This is an interesting post because it explains why the developers of the database engine would purposefully ignore something that could make things faster, but at a potentially devastating cost.

Comments closed

AD Authentication with SQL Server on Linux

Anthony Nocentino will have none of your SQL authentication:

In this post, we’re going to walk through configuring Active Directory authentication for SQL Server on Linux. We will start by joining the Linux server to the domain, configuring SQL Server on Linux to communicate to the domain, and then use adutil to create our AD users and set up Kerberos for SQL Server login authentication.

This does take a bit more effort than using Windows authentication, but if you want to use SQL Server on Linux, I’d consider it a worthwhile investment of time.

Comments closed

ElasticMapReduce Serverless

Damon Cortesi, et al, announce serverless EMR is now in preview:

Today we’re happy to announce Amazon EMR Serverless, a new option in Amazon EMR that makes it easy and cost-effective for data engineers and analysts to run petabyte-scale data analytics in the cloud. With EMR Serverless, you can run applications built using open-source frameworks such as Apache Spark, Hive, and Presto, without having to configure, manage, optimize, or secure clusters. EMR Serverless automatically provisions and scales the compute and memory resources required by your applications, and you only pay for the resources that your applications use.

In this post, we discuss the benefits of EMR Serverless, walk you through the core concepts of EMR Serverless and how you can use it, and show you a quick demo.

If you’re already using EMR for ephemeral work—that is, using a Spark cluster to perform data transformations and then shutting it down—this makes a lot of sense as long as there’s not a major difference in cost.

Comments closed

A Primer on Apache Spark

Tomaz Kastrun has started a new series:

Apache Spark is an open-source unified analytics engine for large-scale data processing. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Originally it was developed at the Berkeley’s AMPLab, and later donated to the Apache Software Foundation, which has maintained it since.

Click through to learn more about the product.

Comments closed

Creating an Availability Group on Linux in Azure with Pacemaker

Andrew Pruski slams in all of the exciting nouns:

There are new Ubuntu Pro 20.04 images available in the Azure marketplace with SQL Server 2019 pre-installed so I thought I’d run through how to create a three node pacemaker cluster with these new images in order to deploy a SQL Server availability group.

Disclaimer – The following steps will create the cluster but will not have been tested in a production environment. Any HA configuration for SQL Server needs to be thoroughly tested before going “live”.

Click through to see how.

Comments closed

To and From Date Filtering in one Slicer

Reza Rad uses a date slicer:

Power BI from and to date slicer

It happens that you might have two fields as From and To date (or Start and End date) in your dataset, and you want a date slicer in the report. The date slicer has to filter records in a way that the FROM and the TO dates are in the range of dates selected in the slicer. There are multiple ways of doing this. In this article and video, I’ll explain a simple but effective method for that. I have explained in another article, how this can be done using two date slicers, you can read that from here.

Click through to see what you need and how you can put one of these in place.

Comments closed

Diagnosing Stalled Dispatchers

Sean Gallardy has another memory dump to investigate:

If you’ve been following with the series of demystifying dumps then you’re probably thinking or have thought something along the lines of, “All of these issues seem to be around stalled or non-yielding things…” and you’d be correct. If you revisit the non-yielding scheduler dump post, I explain a little about cooperative scheduling and how things need to place nice, if one piece isn’t playing nice by doing its part in a timely manner or not giving up time to others, that’s going to be a very large problem for the system in general. The stalled dispatcher falls into the same set of issues.

There are various dispatcher pools in SQL Server, though the most widely known one is for extended events. Dispatcher pools are items that hold pieces of work to be completed, like a dumping ground for “someone pick this up and do this” type of tasks. If it helps, you can think of them in a similar way as connection pooling, where there are background threads which will do something (like making or updating connections) when needed. It can respond to pressure by making new threads if needed or trim threads when the work slows down. In SQL Server dispatchers set their own attributes including when an item is considered stuck, the number of threads to use, etc., which means some may never be considered stalled as they are able to disable their own timeouts.

Read on to see what might cause a stalled dispatcher and what a memory dump for one can tell us.

Comments closed