Press "Enter" to skip to content

Category: Bugs

Capturing a Call Stack without a Dump

Bob Dorr shows off Arvind’s SQL Call Stack Resolver:

Some outputs, such as the XEvent call stack action output the raw stack frame information and require a rebase to loaded module information in order to symbolize.   The security feature for random address virtualization loads images at different addresses each time the image is loaded.   This requires the module base address and the raw address in order to calculate the relative virtual address for symbolization.

Click through for more information.

Comments closed

Physical Read Double-Counting in Query Stats

David Alcock reviews the latest SQL Server 2019 cumulative update:

Microsoft recently released Cumulative Update 15 for SQL Server 2019. It contains a bunch of fixes and some improvements, I get a bit geeky with updates like this and love to have a look through the different fixes to see “Physical reads for read-ahead reads are counted incorrectly (two times) when you run queries. Therefore, the information in sys.query_store_runtime_stats and sys.dm_exec_query_stats shows incorrect values.”

Read on to see what this means and a quick test to see if it works as expected.

Comments closed

Replication Error 20084 on SQL Server 2019

I ran into a weird issue:

Iwas helping out with a SQL Server upgrade recently, going from 2016 to 2019. We ran into a problem when trying to run replmerg.exe for a merge replication subscription. Specifically, we were getting error code 20084, which means that the replication process couldn’t connect to one of the instances. Interestingly, the process couldn’t connect to the local instance, and the failure was immediate—that is, within a couple of milliseconds. There was nothing in the management logs on either the distributor server or the subscriber server which indicated a problem. We were able to connect both sides together just fine—from the subscriber, we could connect to the distributor, and from the distributor, we could connect to the subscriber.

Click through for what error code 20084 typically means, as well as what turned out to be the problem here.

Comments closed

Locking Issue with Columnstore Indexes

Joe Obbish troubleshoots an issue on tables with columnstore indexes:

I recently ran into a production issue where a SELECT query that referenced a NOLOCK-hinted table was hitting a 30 second query timeout. Query store wait stats suggested that the issue was blocking on a table with a nonclustered columnstore index (NCCI). This was quite unexpected to me and I was eventually able to produce a reproduction of the issue. I believe this to be a bug in SQL Server that’s present in both RTM and the current CU as of this blog post (CU14). The issue also impacts CCIs as well but I did significantly less testing with that index type.

Read on for the issue, how you can replicate it, and a couple ways to work around it.

Comments closed

Error Messages during Change Tracking Cleanup

Lee Markum troubleshoots some error messages:

You’re a data professional and you’re trying to keep up with patching a wide range of SQL Server versions and editions. How do you know what’s in the CU and whether you should apply it or not? My favorite way to read up on CUs is to go to SQLServerUpdates and click around there.  It will take you to the latest CU pages as well as give you a way to see previous CUs that are available.

While doing this recently, I discovered this for CU 26 on SQL Server 2017.

These sorts of regressions do slip in, so keep an eye on them before (and after) upgrading. Lee gives us a concreate example of one in a recent CU of SQL Server 2017.

Comments closed

Adding Debug Logic to T-SQL Procedures

Deborah Melkin does not take kindly to bugs:

I often find that I have to write complicated stored procedures where I need to check things as I go along. My go-to for using this snippet is when I write stored procedures that use dynamic SQL. You’d be surprised (or not) at how often I have had to do this over the years. There’s been functionality where the user gets to choose the columns being used, rewriting ORM data layer “catch-all” queries to improve performance, and cross database queries where the name of the database may not be the standard name (think development and QA databases living on the same SQL instance.)

Click through for an example of where the @Debug parameter pays off. My recollection was that, for really long NVARCHAR(MAX) strings, running PRINT by itself might cut off the code after ~4000 characters, but that could be a historical recollection.

1 Comment

Persist Sample Percent Bugfix in SQL Server

John Sterritt has good news for us:

Hi Everyone, this is John Sterrett. I am a SQL Server Consultant in Austin, TX. Last year I blogged about a feature called Persist Sample Percent. It had a nasty bug that could negatively impact performance. I have great news to share. The fix is now rolled into SQL 2016 SP2 CU17 and SQL 2019 CU10Pedro Lopes let me know that with the fix now queued for SQL 2017 CU26, this becomes fixed in all versions.

Read on to see what this mean and why it’s important.

Comments closed

Debugging Memory Access Violation Errors

Sean Gallardy explains access violations:

Access Violations (AV) are another common error that will cause SQL Server to take a memory dump. These can occur for a variety of reasons, and unlike the last one (Non-Yielding Scheduler) it’s quite the task, especially given public symbols don’t have structure data and offsets.

Read on to see what they are, what a dump file may include, and why they can be so difficult to debug.

Comments closed

Security Breach in Cosmos DB: ChaosDB

Nir Ohfeld and Sagi Tzadik discovered a flaw in Azure Cosmos DB:

Nearly everything we do online these days runs through applications and databases in the cloud. While leaky storage buckets get a lot of attention, database exposure is the bigger risk for most companies because each one can contain millions or even billions of sensitive records. Every CISO’s nightmare is someone getting their access keys and exfiltrating gigabytes of data in one fell swoop.

So you can imagine our surprise when we were able to gain complete unrestricted access to the accounts and databases of several thousand Microsoft Azure customers, including many Fortune 500 companies. Wiz’s security research team (that’s us) constantly looks for new attack surfaces in the cloud, and two weeks ago we discovered an unprecedented breach that affects Azure’s flagship database service, Cosmos DB.

Read on for details about the attack. Microsoft has already mitigated the issue by disabling the functionality necessary to pull off the attack. H/T Ben Stegink.

Comments closed

Dealing with Non-Yielding Schedulers

Sean Gallardy breaks up the party:

One of the most common items that will cause a memory dump in SQL Server is a non-yielding scheduler (generally referred to as NYS). What the heck does that mean? Why would it cause a memory dump? Is there anything that can be investigated? Good questions, let’s take a look.

Read on to learn what these are, why they’re not something you want to deal with on a regular basis, and how you can get more information on what happened out of a dump file. Which is also going to be helpful for Microsoft staff to diagnose and correct the underlying issue (if possible).

Comments closed