Press "Enter" to skip to content

Author: Kevin Feasel

Using Availability Groups For Upgrades

Adrian Buckman has a fun post on upgrading to SQL Server 2017 (CTP) using Availability Groups to minimize downtime:

Don’t panic, this is still going as planned as this is totally expected and this is the reason why:

We are now in a situation where we have the Primary server running 2017 but one (for us) or possibly more than one for you running on 2016 , its not possible for the 2016 server to synchronize as its databases have not been upgraded yet, they will therefore be stuck in recovery but we are about to fix that very soon.

This is a viable upgrade option:  we did it when upgrading from 2014 to 2016.  There are a lot of steps, but in the end, it worked fine.

Comments closed

Tracking Database Restorations

Erik Darling points out that figuring out when a database restoration occurs is much more difficult than you’d hope:

Astute SQL-ers may attempt to add a trigger to the restorehistory table over in msdb. It’s in the dbo schema, which might make you hopeful. We all know triggers in that pesky sys schema don’t do a darn thing.

You guessed it, restores get tracked there. So there’s, like, something inside SQL telling it when a restore happens.

Guess what, though? A trigger on that table won’t fire. Not FOR INSERT, not AFTER INSERT, and not nothin’ in between.

Read on for more things that don’t work…  Also check out the comments; I think Dave Mason has the best answer there.

Comments closed

Mann-Whitney U Test in SQL

Phil Factor continues his Statistics in SQL series with the Mann-Whitney U test:

There are several ways that you can test this, but nobody is going to argue with you if you use a Mann–Whitney U test to test whether two samples come from the same distribution. It doesn’t require that the data has any particular type of distribution. It just requires that each observation is done by a different member of the population so that all the observations from both groups are independent of each other. It is really just a test of differences in mean-rank between two populations’ pooled ranking. To test this difference It has to be possible to compare any of the observations with any of the others and say which of the two are greater. Your objective is to disprove the assumption that The distributions of both populations are equal. Calculating a measure of the difference is simple, and was designed to be done easily by hand before computers. The probability that the observed difference occurred by chance is easily calculated for large samples because U then approximates to the normal distribution, but it is complex for small samples. Here, we have a small sample and are just interested in whether the two-tailed test is signifcant at the five percent level so we dodge the bullet by using a significance lookup table for the critical value of U.

Read on for Phil’s implementation of the test.

Comments closed

Comparing Memory-Optimized Versus On-Disk Performance

Erin Stellato has a performance comparison between disk-based and memory-optimized tables:

I developed the following test cases:

  1. A disk-based table with traditional stored procedures for DML.
  2. An In-Memory table with traditional stored procedures for DML.
  3. An In-Memory table with natively compiled procedures for DML.

I was interested in comparing performance of traditional stored procedures and natively compiled procedures, because one restriction of a natively compiled procedure is that any tables referenced must be In-Memory. While single-row, solitary modifications may be common in some systems, I often see modifications occurring within a larger stored procedure with multiple statements (SELECT and DML) accessing one or more tables. The In-Memory OLTP documentation strongly recommends using natively compiled procedures to get the most benefit in terms of performance. I wanted to understand how much it improved performance.

Read on for the results.

Comments closed

Batched Data Archival

Anders Pedersen has a script to archive data in smaller batches:

Seeing as the data had to be retrievable for any date, I could not simply delete the very old data.  These tables also had constant inserts and updates into them, so making sure the tables remained available became important, i.e. needed to have acceptable time that the table was being locked, with time for waiting transactions to finish.

The solution I came up with does this with variable size batches.  Now, with modern versions of SQL, there are other ways to do this, but the good thing about this method it works regardless of version of SQL, as well as edition.  Azure SQL DB would need some modification to make it work to archive to a separate database.

Click through for the script.

Comments closed

S3Guard

Mingliang Liu and Rajesh Balamohan explain why you shouldn’t use S3 as your primary Hadoop data store, as well as a tool which helps mitigate those problems:

Some of the real world use cases which can be impacted due to the S3 eventual consistency model are:

  1. Listing Files. Newly created files might not be visible for data processing. In Hive, Spark and MapReduce, this can lead to erroneous results from incomplete source data or failure to commit all intermediate results.

  2. ETL Workflow. Systems like Oozie rely on marker files to trigger the subsequent workflows. Any delay in the visibility of these files can lead to delays in the subsequent workflows.

  3. Existence-guarded path operations. Any action which fails if the destination path is present may see a deleted file in a listing, and so fail — even though the file has already been deleted.

Read on to see how S3Guard works and how to enable it in HDP 2.6.

Comments closed

Finding Last DBCC Command Runs

Andrew Kelly has a script to find the last time somebody ran a DBCC command like DBCC FREEPROCCACHE:

Let me explain a few things about the script. I am getting the path of the current trace file and placing it into a variable. The current file name will almost certainly have a suffix of _nn just before the .trc extension.  If I were to run the script as is I would only be reading the current log file and not the other 4 that preceded it. If all you care about is the current log file then fine but most will want to search all the existing log files. One way to do this is to simply replace the current file name with just log.trc and use default as the 2nd parameter as I did above in the fn_trace_gettable function. The default parameter value tells the function to read all files from that one onward. even though log.trc doesn’t actually exist it knows how to handle it and reads all of the existing trace files in order.

So if the string that we search on (here we use ‘dbcc free%’) is in any of the files it will return the matching rows. You may have to adjust the wildcards and such but I think you get the idea. Again remember that the data is transient so always look at the StartTime column in the logs to ensure you know which Date and Time range you are looking at. You can do something like this but I will leave that up to you.

SELECT MIN(StartTime) AS [Begin], MAX(StartTime) AS [End]  FROM ::fn_trace_gettable(@Path,default)

A word of caution in that I never bothered to see just how resource intensive this function is. while I don’t expect any issues with normal use it is not something you want to be searching on every second. Be sensible and you should have no problems.

Click through for more details, including the script Andy uses to do this search.

Comments closed

What You Need To Know About DTC

Allan Hirt gives some important information regarding the Distributed Transaction Coordinator:

What exactly is a distributed transaction? It’s one where the work needs to be completed in more than one database so data is kept in sync everywhere. For example, if you need to update data in Database A and in Database B, and they need to be kept in sync, that’s a distributed transaction. Database A and Database B can be in the same SQL Server instance, or they could be in different instances … or even in different data sources, such as Oracle or DB2. This whole shebang is often referred to as a cross-database transaction. DTC is based on the principle of a two phase commit – for the whole thing to get done, all the little bits need to be committed everywhere before claiming complete success. Otherwise stuff needs to be rolled back so that things stay in sync and all is right in the world.

Read on for Allan’s thoughts and guidance.

Comments closed

Using Event Notifications To E-Mail Deadlock Graphs

Dave Mason captures details whenever a deadlock occurs and uses Event Notifications to e-mail them to himself:

As noted, there are other ways to handle deadlocks in SQL Server. The approach presented here may have some drawbacks compared to others. There is an authorization issue for msdb.dbo.sp_send_dbmail that will need to be addressed for logins without elevated permissions. Additionally, you might get hit with an unexpected deluge of emails. (The first time I got deadlock alerts, there were more than 500 of them waiting for me in my Inbox.) Lastly, there’s the XML issue: it’s not everyone’s cup of tea. On the plus side, I really like the proactive nature: an event occurs, I get an email. I think most would agree it’s better to know something (bad) happened before the customers start calling. The automated generation of Deadlock Graph (*.xdl) files is convenient. And event notifications have been available since SQL Server 2005. As far as I know, the feature is available in all editions, including Express Edition.

Click through for all of the code Dave used to set this up.

Comments closed

EXISTS Is Self-Contained

Shane O’Neill ponders an existential problem:

So, drinking my first (of many) coffee of the day, I asked him what was wrong with it.

I have two tables. 1 with values 1,2,3 & the other with values 1,2,3,4,5. When I use delete exists, it should just delete 1,2,3 but table1 is always empty.

Hmmm, not an unreasonable assumption I suppose so I asked him for his code.

Read on for Shane’s explanation, though he doesn’t like the verbosity.  My version is, what happens in EXISTS stays in EXISTS.  It just returns a signal to the outer query saying yea or nay and the outer query does its thing accordingly.  In this case, if you want to tie results back to the delete operation, use IN (the ANSI standard way) or JOIN (typically my preferred way, given that IN can get dicey with more complex criteria).

Comments closed