Microsoft set the default value for the Cost Threshold for Parallelism back in the 1990s. They were developing SQL Server 2000. That means this value was determined as a good starting point for query plan costs over 17 years ago. In case you’re not aware, technology, T-SQL, SQL Server, and all the databases and database objects within them shifted, just a little, in the intervening 17 years. We can argue whether or not this value made sense as a starting point (and remember, the default settings are meant to be starting points covering a majority of cases, not a final immutable value) for determining your Cost Threshold for Parallelism 17 years ago. I think we can agree that it’s no longer even a good starting point.
For more thoughts, check out a prior post on figuring out the cost threshold.
I’ve recently been doing some work with Hadoop using the Hortonworks distribution. Most recently I configured Knox to integrate with Active Directory. The end goal was to be able to authenticate with Active Directory via Knox (a REST API Gateway) and then on to other services like Hive. I also configured Knox to point to Zookeeper (HA service discovery) vs. Hive directly, but that’s really more detail than we need for integrating Knox with AD.
The Knox documentation is really good and very helpful:
Worth the read if you’re putting together a Hadoop cluster.
I dug into the script – it was simple – it pulled an alphabetical list of databases from system metadata and proceeded to back them up. It didn’t do this one simple thing – leave TEMPDB off the list. So when the backups got down to TEMPDB, they promptly failed. Now as a smart person – I should have just communicated this to her and had it fixed quietly. But, I was young and rather hot headed at that time. It amazed me that a DBA with several years of experience did not know that TEMPDB cannot be backed up. So, I waited until the team meeting the next day. And when the said job failure came up again – I said that I knew the reason and stated why. I also added that this was a ‘very basic thing’ that junior DBAs are supposed to know. I was stopped right there. It did not go down well. Her face was flaming red because a consultant showed her up in front of her boss in a bad light. She said she would talk to her boss and collegues the next day (several of whom were nodding their heads disapprovingly at me) and meeting was abruptly adjourned.
In this case, I don’t think there were any good actors.
What I found was that the list of IDs was being stored in a table without a clustered index. A heap. Now – I’m not opposed to heaps at all. Heaps are often very good, and shouldn’t be derided. But you need to understand something about heaps – which is that they’re not suited to tables that have a large amount of deletes. Every time you insert a row into a heap, it goes into the first available slot on the last page of the heap. If there aren’t any slots available, it creates a new page, and the story continues. It doesn’t keep track of what’s happened earlier. They can be excellent for getting data in – and Lookups are very quick because every row is addressed by the actual Row ID, rather than some key values which then require a Seek operation to find them (that said, it’s often cheap to avoid Lookups, by adding extra columns to the Include list of a non-clustered index). But because they don’t think about what kind of state the earlier pages might be in, you can end up with heaps that are completely empty, a bunch of pointers from page to page, with header information, but no actual rows therein. If you’re deleting rows from a heap, this is what you’ll get.
This guy’s heap had only a few rows in it. 8 in fact, when I looked – although I think a few moments later those 8 had disappeared, and were replaced by 13 others.
But the table was more than 400MB in size. For 8 small rows.
Read the whole thing, including Rob’s reluctance to post on this topic.
Monday morning came along and as scheduled, our Production DBAs executed my script before start of business. 30 minutes later, the frantic calls started to reach us. Seems some of our clients could no longer “make widgets!” Accounts that they needed to route data were gone! My manager and I looked at one another in horror – we were only deleting internal accounts!!! We didn’t hesitate and immediately had our Prod DBAs back out the change with my backout script, before the rest of the United Stated started business. The backout was executed immediately and all was back to normal, but business folks were pissed and wanted to know what happened.
It was good foresight to have a backout script.
People who have been granted Power BI administrator rights will also notice a modification to the Admin screen. The March 2017 update to Power BI provides a major change to the security model in Power BI. Previously all the security settings were set at the Tenant Level, meaning that all the privileges were granted to all users. If I wanted to allow one group within the organization to be able to publish reports to the web, but I did not want to allow everyone to publish reports to the web, there was no way that this could be accomplished. All that has changed. It is now possible to include or exclude groups of users from having rights in Power BI. Users can be classified into security groups in Azure Active Directory, either through the Office 365 Admin Center or via the Azure AD Admin Center. Once created the security groups can be used in Power BI. Security Groups are not the same thing as the groups created in Power BI when a new work group is created.
Read the whole thing.
Looks like I had open transactions while my transaction log got lost during an outage. I tried switching the database online but that failed.
Msg 5181, Level 16, State 5, Line 4 Could not restart database “FAT”. Reverting to the previous status. Msg 5069, Level 16, State 1, Line 4 ALTER DATABASE statement failed.
Accessing the database is the real challenge now.
Moral of the story: have backups and have good luck.
The below is a link to my GitHub repo of my personal collection of scripts. BTW this isn’t all of the scripts, but is probably 50% of the ones I have lurking around.
These are either scripts I have developed in my own time over the years, or adapted from various other websites which I found handy at the time for whatever reason.
Where I have remembered that web site (like most people, which is unfortunately almost never) it was typically from sites like stack overflow etc, and if so I have tried to cite it as such to provide credits.
Click through for the link to Rolf’s repo.
Notionally we can imagine a global portfolio of active memory allocations, each chunk uniquely identified by its starting address. When we want memory, we ask the global memory manager to lend us some from the unused pool, and when we’re done with it, we hand it back to that memory manager, who carefully locks its internal structures during such operations, because we should only access mutable global state in a single-threaded manner, and…. Oops. No, no, double no. That is not how SQL Server does things, right?
Okay, we know that there are actually a variety of memory allocators out there. If nothing else, this avoid the single bottleneck problem. But now the question becomes one of knowing which allocator to return a chunk of memory to after we’re done with it.
As usual, this is a deep and interesting blog post from Ewald.