Fortunately we can find queries with high CPU time using sys.dm_exec_query_stats DMV. This DMV, created in SQL Server 2008, keeps performance statistics for cached query plans, allowing us to find the queries and query plans that are most harming our system.
Glenn Berry’s fantastic set of diagnostic queries also includes a couple for finding CPU consumers.
There are many ways of doing it in SQL. (and Rosetta Code is a good place to view solutions in various other languages). I believe that Peter Larsson holds the record for the fastest calculation of the Luhn test for a sixteen-digit credit card, with this code. As it stands, it isn’t a general solution, but it can be modified for different lengths of bank card.
Phil has two interesting T-SQL functions in the code and wants to find more.
Whilst working on an upcoming post I realised that I needed a decent data set on which to test some of the new visualisations in SSRS 2016. I remembered this post on open data sets by Kendra Little that mentioned Data.gov. There are a lot of different data sets here but I settled on a set describing social media usage by organisations in New York.
The Import-Export wizard is nice for these types of one-off data loads—it’s the gateway into SSIS.
However I wanted to add some tests. Does this really work? What if I don’t have a backslash? I thought the best way to do this was with a few tSQLt tests, which I quickly built. The entire process was 5-10 minutes, which isn’t a lot longer than if I had been running random tests myself with a variety of strings.
The advantage of tests is that if I come up with a new case, or another potential bug, I copy the test over, change the string and I have a new test, plus all the regressions. I’m not depending on my memory to run the test cases.
I first put the code in a function, which makes it easier to test.
tsqlt is a great tool for database unit testing.
I added a column: RollinAvgSteps = AVERAGEX(FILTER(fitbit_export_20160214, EARLIER(fitbit_export_20160214[Date])>=fitbit_export_20160214[Date]),fitbit_export_20160214[Steps])
…which takes the average of my steps to date. There are a bunch of ways to achieve this, but this is the way that I chose. And you can see that the average line is (happily) improving! Oh, and because I pulled down the extract on the 14th, there’s a dip at the end. My numbers were much healthier by the end of the day, and despite spending way too long NOT walking, I did about 7244 steps that day.
You can see the result at http://bit.ly/RobFitbit
I like the rolling average that Rob added in.
The first thing you’ll notice is that a single space is stored the same way in both columns. With an empty string, on the other hand, we see a difference. Char columns are fixed length. So even though we inserted an empty string into it we get back a single space.
The next major difference is that varchar columns require an extra two bytes of storage. So a varchar(1) column actually uses three bytes not just the one byte that char(1) does.
This is exactly the type of scenario row-level compression improves.
Let’s bottom line this. Data is dirty. Any ETL process is likely to expose the trashy data, so what are the real issues here? First up, Powershell is my bestest buddy. Scripting everything through Posh right out of the gate made a huge difference in my ability to constantly reset. The fact that our most common processes spit out UTF16, while not a problem, is something you need to get on top of (see Posh above). AZCopy works really well and the command line is easy to implement, but you’ll need to go through the process of setting up all the appropriate connections. Polybase is wicked cool. Yes, I had to hop through hoops to get the data clean, but, what you should note in the above descriptions, at no point did I need to go and recreate my EXTERNAL TABLE from Azure SQL Data Warehouse. I just kept updating the files through my scripts and as soon as they were loaded up to blob storage, I ran a query against them. No additional work required. That’s nice. You’re going to want to go through your data types on your data before you start the migration. You’re going to have to plan for a lot of time cleaning your data if you haven’t already.
Grant’s method is the right way, especially for early tests. In practice, for the type of data you’d put into Polybase, you might want to create the external table to allow rejecting a certain number of rows—Grant didn’t specify the REJECT_TYPE and REJECT_VALUE attributes creating his external table, so the default of 0 rows was used. In a warehouse with billions of rows, hand-fixing all of that data is a nasty proposition, and if you’re writing queries whose results likely won’t change if a few (dozen?) records get dropped, rejecting bad data might be a good way to keep some of your sanity.
At the most basic level, the precedence rule states that once there is an override row for a database, that database will never leave that level…it will never default back to the default row. So in this example, MinionDev is at the database level for its settings, so it will never go back up to the more generic MinionDefault row. Once you’re at a level, you stay at that level.
Even if you don’t use Minion Backup, this is an interesting post because you can walk through Sean’s design process and think about the approach he took to get to his final result.
You can think of row compression as working by treating certain fixed-length data types as variable-length data types. By removing certain metadata, NULL and 0 values, and the padding of fixed-length values, SQL Server can reduce the total size of a row.
The easiest way to think of it is that char(n) no longer takes n bytes for every row, but instead gets treated more like varchar(n) where the storage used varies for each value. The behavior for each data type varies, with some data types getting more or less (or no) savings compared to others.
Row-level compression is the “safer” of the two primary compression options, but I almost never use it. That might just be a function of the my particular workloads, of course.
The management and maintenance of servers quickly becomes complex without standardisation. PowerShell allows us many ways of responding to different problems and occasionally bypassing certain restrictive techniques. The danger is that we end up with a plethora of scripts in order to manage a server, all more complicated than the other, and this works. We do much better with a standard way of automating a task
DSC gives us a declarative model for system configuration management. What that really means is that we can specify how we want a workstation or server (a ‘node’) to be configured and we leave it to PowerShell and the Windows Workflow engine to make it happen on those target ‘nodes’. We don’t have to specify how we want it to happen.
DSC is great for keeping servers and server configurations in sync.