We can use minidumps to generate small files that contain SQL Server call stacks. Note that if you aren’t careful you can end up writing the full contents of memory to a dump file. I believe that writing the file is a single-threaded process, so this can take hours and hundreds of GBs on a real server. Also SQL Server is frozen while the dump is happening, so don’t do it in production.
Dumps are most useful when you want to examine a long running, very simple query or a query that seems to be stuck at a certain point. They aren’t useful to see a full call stack of a non-simple query or to see something that happens just once during execution, like a specific task during query compilation. All that you can get is a snapshot and the snapshot may not contain the information that you’re looking for.
There are multiple ways to generate and read dump files. For this blog post I’ll be using sqldumper.exe and WinDbg.
Click through to read the whole thing.
Intelligent automation is critical under these circumstances, which is why we developed Cruise Control: a general-purpose system that continually monitors our clusters and automatically adjusts the resources allocated to them to meet pre-defined performance goals. In essence, users specify goals, Cruise Control monitors for violations of these goals, analyzes the existing workload on the cluster, and automatically executes administrative operations to satisfy those goals. You can see a video here about Cruise Control at the Stream Processing Meet Up last fall.
Today we are pleased to announce that we have open sourced Cruise Control and it is now available on Github. In this post, we’ll describe Cruise Control’s uses both generally and at LinkedIn, its architecture, and some unique challenges we faced when creating it. For further details about Kafka terminology used throughout this post, this reference can be a helpful guide.
This isn’t a monitoring tool per se, but rather a resource balancing tool. And it’s now freely available to all.
Since we are storing the date the records are added to the table, this query will always return the latest set of failures. This is a simple example, but the possibilities are endless:
Send the results of this query via database mail
dbo.syscategories, alerting on different thresholds per job category
TOP (1)to include multiple capture periods and alert on average failures per capture
Check it out. This is particularly helpful if you get blasted with thousands of error messages per minute because somebody made a bunch of untested changes and broke every job in your environment and caused the mail server to throttle your account for a multi-day period. Not that this has ever happened to me, of course…
For the sake of this post, I am talking specifically about SQL Server 2012 and higher.
Let’s break this down.
On all editions of SQL Server (Enterprise, Standard, Web, Express), there is a configuration setting called Max Server Memory. This value should not be left at the default 2.1 petabytes, because at that value SQL Server will use all the memory you allow it (plus a little bit more if it needs to), and once it has it, it doesn’t release it back to the operating system without a fight.
Read on to see what Max Server Memory actually includes, as well as additional limitations on editions other than Enterprise.
Recovering database with memory-optimized tables involves hydrating the contents of checkpoint files (data/delta files) into memory and then replaying the tail of the log (see this link for more details). One of the important difference between disk based tables and memory-optimized store is frequency of checkpoints. Automatic checkpointing for in-memory tables occurs every 1.5GB of log records unlike traditional or indirect checkpoints (where checkpointing is done more often) leading to longer tail of log for in-memory tables. The 1.5 GB log flush is chosen to strike the right balance between flooding the IO subsystem with too many small inefficient IO operations and too few large IOPs. In most scenarios observed by our CSS teams, long recovery times for memory optimized databases is caused by the long tail of log which needs to be recovered for in-memory tables in the database. For these scenarios, running a manual checkpoint before a restart can reduce recovery times as manual checkpoint forces the checkpoint for memory optimized tables in addition to disk based tables.
If you’re looking at creating memory-optimized tables, these are important administrative notes.
What causes High VLFs?
As transactions force growth of the log file, inappropriate log file sizing or auto-growth settings can cause a high number of VLFs to occur. Each growth event adds VLFs to the log file. The more often you grow in conjunction with smaller growth segments, the more VLFs your transaction log will have.
If you grow your log by the default 1 MB you may end up with thousands of VLFs as opposed to growing by 1GB increments. MSDN does a great job on explaining how a transaction logs work for a deeper dive I recommend reading it.
Read on to see how many VLFs your databases have, as well as how to reduce the number should it grow excessive.
Yesterday, I was running a health assessment for a client. They are running a weekly maintenance plan that is shrinking all of their data files. After I picked myself up off the floor, I searched the web for “Paul Randal shrink” and hit on Paul’s excellent post Why you should not shrink your data files. In the post, Paul (b|t) demonstrates the effect of DBCC SHRINKDATABASE on index fragmentation. After the demo script, Paul writes, “As well as introducing index fragmentation, data file shrink also generates a lot of I/O, uses a lot of CPU and generates *loads* (emphasis Paul’s) of transaction log.”
This led me to ask the question, “How much is *loads*?”. To find an answer, I made the following modification to Paul’s script:
Read on for the answer. There are legitimate reasons to shrink data files, but it comes at a very high cost.
In this example, I used xp_regread to read the direct registry path. If you remember from earlier, there are SQL Server instance-aware versions of each registry procedure. A comparable statement using the instance-aware procedure would be:
1234 EXECUTE master.sys.xp_instance_regread‘HKEY_LOCAL_MACHINE’,‘Software\Microsoft\MSSQLSERVER\SQLServerAgent’,‘WorkingDirectory’;
This statement returns the exact same information. Let’s look at the difference between these – in the first query, the registry path is the exact registry path needed, and it includes “\Microsoft SQL Server\MSSQL12.SQL2014\”. In the latter query, this string is replaced with “\MSSQLSERVER\”. Since the latter function is instance aware, it replaces the “MSSQLSERVER” with the exact registry path necessary for this instance of SQL Server. Pretty neat, isn’t it? This allows you to have a script that will run properly regardless of the instance that it is being run on. The rest of the examples in this post will utilize the instance-aware procedures to make it easier for you to follow along and run these yourself.
Sometimes you just have to change something in the registry from SQL Server. Hopefully that “sometimes” is rare.
The key for me has been to “embrace the change”. I have come from a traditional DBA background, backups, consistency checks, server level configuration at the heart of things. Yes, I no longer care about SQL backups and things of that nature when operating within Azure but I have different tasks and to think about.
Sceptical? Please read on.
Do read on.
In a perfect world, your SQL Server would be so secure that you would not need antivirus software, you would have behind layers of firewalls, nobody would ever connect with remote desktop to install anything, and it would always have all of the latest security patches… But that is not the real world.
Given that your SQL Server often times contains extremely valuable information, and that the damage that could be done by virus software, malware, and ransomware could be so great then it is strongly recommended that you run antivirus software on your SQL Server. There are some files that you will want to exclude from the virus check.
I’m not a big fan of running anti-virus software on database instances, but if you have to run it for whatever reason, be sure to check out Steve’s advice.