But what if you can’t fix the source data? I was asked this question the other week, and since I had been asked about it before and not come up with a good answer, I decided to spend some time researching the problem.
What I found was that it was relatively easy to write some M code that gave me the correct results, but very hard to write code that performed acceptably well on a large data set (I was testing on a CSV file containing almost half a million rows). Here’s the code for the function I ended up with:
It’s nice to see that Power Query & Power BI have methods to get around this sort of issue, but it sounds like even those methods are limited in value.
The most critical thing as a SQL Server DBA is to ensure that your databases can be restored in the event of the loss of a server for whatever reason: disk crash, fire in the server room, tribble invasion, whatever. To do this, not only do you have to back up your databases, you also have to test restores! Create a database and restore the backups of your production DB to them. It’s the safest way to make sure that everything works. This test restore can be automated to run every night, but that’s outside the scope of what I want to talk about right now.
There are lots of places that problems can creep in, this is just one part of how you’ll need to monitor systems. This is how I’ve done things for a number of years, and thus far it has served me well.
Depending upon your instance count, average database size, maintenance windows, etc. etc. etc., some of these things may or may not work, but the principle is the same: protect the data, and automate your processes to protect that data. This is a good article to read for ideas, and then from there dig into other administrative blog posts, videos, and books to become better versed in the tools and techniques available to protect your data.
Fortunately we can find queries with high CPU time using sys.dm_exec_query_stats DMV. This DMV, created in SQL Server 2008, keeps performance statistics for cached query plans, allowing us to find the queries and query plans that are most harming our system.
Glenn Berry’s fantastic set of diagnostic queries also includes a couple for finding CPU consumers.
There are many ways of doing it in SQL. (and Rosetta Code is a good place to view solutions in various other languages). I believe that Peter Larsson holds the record for the fastest calculation of the Luhn test for a sixteen-digit credit card, with this code. As it stands, it isn’t a general solution, but it can be modified for different lengths of bank card.
Phil has two interesting T-SQL functions in the code and wants to find more.
Whilst working on an upcoming post I realised that I needed a decent data set on which to test some of the new visualisations in SSRS 2016. I remembered this post on open data sets by Kendra Little that mentioned Data.gov. There are a lot of different data sets here but I settled on a set describing social media usage by organisations in New York.
The Import-Export wizard is nice for these types of one-off data loads—it’s the gateway into SSIS.
However I wanted to add some tests. Does this really work? What if I don’t have a backslash? I thought the best way to do this was with a few tSQLt tests, which I quickly built. The entire process was 5-10 minutes, which isn’t a lot longer than if I had been running random tests myself with a variety of strings.
The advantage of tests is that if I come up with a new case, or another potential bug, I copy the test over, change the string and I have a new test, plus all the regressions. I’m not depending on my memory to run the test cases.
I first put the code in a function, which makes it easier to test.
tsqlt is a great tool for database unit testing.