2016-02-17 – Curated SQL

But what if you can’t fix the source data? I was asked this question the other week, and since I had been asked about it before and not come up with a good answer, I decided to spend some time researching the problem.

What I found was that it was relatively easy to write some M code that gave me the correct results, but very hard to write code that performed acceptably well on a large data set (I was testing on a CSV file containing almost half a million rows). Here’s the code for the function I ended up with:

It’s nice to see that Power Query & Power BI have methods to get around this sort of issue, but it sounds like even those methods are limited in value.

Comments closed

Database Maintenance

Published 2016-02-17 by Kevin Feasel

SQLWayne describes his maintenance routines:

The most critical thing as a SQL Server DBA is to ensure that your databases can be restored in the event of the loss of a server for whatever reason: disk crash, fire in the server room, tribble invasion, whatever. To do this, not only do you have to back up your databases, you also have to test restores! Create a database and restore the backups of your production DB to them. It’s the safest way to make sure that everything works. This test restore can be automated to run every night, but that’s outside the scope of what I want to talk about right now.

There are lots of places that problems can creep in, this is just one part of how you’ll need to monitor systems. This is how I’ve done things for a number of years, and thus far it has served me well.

Depending upon your instance count, average database size, maintenance windows, etc. etc. etc., some of these things may or may not work, but the principle is the same: protect the data, and automate your processes to protect that data. This is a good article to read for ideas, and then from there dig into other administrative blog posts, videos, and books to become better versed in the tools and techniques available to protect your data.

Comments closed

Finding High-CPU Queries

Published 2016-02-17 by Kevin Feasel

Dennes Torres has a script to check CPU-intense queries:

Fortunately we can find queries with high CPU time using sys.dm_exec_query_stats DMV. This DMV, created in SQL Server 2008, keeps performance statistics for cached query plans, allowing us to find the queries and query plans that are most harming our system.

Glenn Berry’s fantastic set of diagnostic queries also includes a couple for finding CPU consumers.

Comments closed

Luhn Testing In T-SQL

Published 2016-02-17 by Kevin Feasel

Phil Factor shows us the Luhn algorithm, a quick test to determine if a credit card number is potentially valid:

There are many ways of doing it in SQL. (and Rosetta Code is a good place to view solutions in various other languages). I believe that Peter Larsson holds the record for the fastest calculation of the Luhn test for a sixteen-digit credit card, with this code. As it stands, it isn’t a general solution, but it can be modified for different lengths of bank card.

Phil has two interesting T-SQL functions in the code and wants to find more.

Comments closed

Using The Import-Export Wizard

Published 2016-02-17 by Kevin Feasel

James Anderson uses the Import-Export wizard to import data:

Whilst working on an upcoming post I realised that I needed a decent data set on which to test some of the new visualisations in SSRS 2016. I remembered this post on open data sets by Kendra Little that mentioned Data.gov. There are a lot of different data sets here but I settled on a set describing social media usage by organisations in New York.

The Import-Export wizard is nice for these types of one-off data loads—it’s the gateway into SSIS.

Comments closed

Unit Testing A Function

Published 2016-02-17 by Kevin Feasel

Steve Jones walks through a practical example of unit testing T-SQL with tsqlt:

However I wanted to add some tests. Does this really work? What if I don’t have a backslash? I thought the best way to do this was with a few tSQLt tests, which I quickly built. The entire process was 5-10 minutes, which isn’t a lot longer than if I had been running random tests myself with a variety of strings.

The advantage of tests is that if I come up with a new case, or another potential bug, I copy the test over, change the string and I have a new test, plus all the regressions. I’m not depending on my memory to run the test cases.

I first put the code in a function, which makes it easier to test.

tsqlt is a great tool for database unit testing.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29

Day: February 17, 2016

CSVs With Line Breaks In Power BI

Database Maintenance

Finding High-CPU Queries

Luhn Testing In T-SQL

Using The Import-Export Wizard

Unit Testing A Function