So now that 1). we have our table and b). we have an index we can use, we can run the developer’s query and be SARGable right?
([Test_Date]) = @
Nope! Table scan, ignores our Index and reads all 15M (too lazy for all the zeros) for a measely 127,782 rows! It’s not the slowest, taking around 3.960 seconds but still, we want SARGable!!!
Watch for the surprise twist at the end.
Azure Data Catalog is a Software as a Service (SaaS) offering in Azure, part of the Cortana Intelligence Suite, for registering metadata about data sources. Check this post for an overview of Azure Data Catalog key features. (I’m a big fan of what Azure Data Catalog is trying to accomplish.)
There are a couple of particulars about Azure Data Catalog which make it a bit more difficult to set up a Demo/Test/Learning type of environment, including:
You are required to sign into Azure Data Catalog with an organizational account. Signing in with a Microsoft account (formerly known as a Live account) won’t work for Azure Data Catalog authentication, even if that’s what you normally use for Azure.
One Azure Data Catalog may be created per organization. Note this is *not* per Azure subscription – if your account has access to multiple subscriptions, it’s still one catalog per organization.
This method seems, shall we say, overly complicated. Here’s hoping that the Azure Data Catalog team find a way to improve this experience.
If it worth knowing that in order to get Bash, it’s a feature you need to installed it first. The following is the series of steps I use to enabled and install Bash on my desktop. And, after enabling Bash, I started using it under the PowerShell Console.
Apropos of this, I read a very interesting article by Alex Clemmer yesterday on how terrible the Windows command line is. Powershell and Bash are way, way better for pretty much any purpose, other than perhaps experiencing masochism.
BimlExpress is a free Visual Studio add-in for working with Biml in your SSIS projects. It allows you to add and edit Biml files, generate SSIS packages from Biml, and the code editor is fully featured with syntax coloring, error highlighting and intellisense.
If you are already using BIDS Helper, you will see that BimlExpress works the same way and includes all the same Biml features as in BIDS Helper – just with a new and improved code editor. No more squiggly red lines, yay!
For normal Biml purposes, I see some benefit. But to me, the biggest benefit is when presenting: a common question which comes up during Biml talks is, “Why does Intellisense say you have so many errors?” A superior code editor is well worth the download.
His approach was to use a cursor to cycle through all the columns in the provided table, analyze each column, determine the new data type, and store the information in a table variable. After the cursor was completed, the data in the table variable was written to a permanent table for the next process to use.
This approach isn’t necessarily bad. If you are only running it infrequently and you needed to write this stored procedure quickly, then it’s fine. But if this type of stored procedure needs to be run frequently, then it should be rewritten.
Set-based code tends to be easier to read and more compact than cursors, so even without the performance improvements they bring, there are benefits.
Probably the most common distinct aggregate is COUNT(DISTINCT). In some respects, it’s similar to the windowed function DENSE_RANK(). DENSE_RANK(), as opposed to ROW_NUMBER(), will only increment the row counter when the ordering column(s) actually change from one row to the next, meaning that we can use DENSE_RANK() as a form of windowed distinct count
This is a very interesting approach to the problem. Read the whole thing.
But what about if you want the chart title to change depending on what is selected? For example, you might be using slicers or filters to allow a user to choose which days of the week they want to see data for. In that situation you might want to add a title that shows which days of the week have actually been selected; this would be particularly important if the report uses filters, or if the report is going to be printed. Unfortunately the built in Title Text property can’t be used to display dynamic values but in this blog post I’ll show you how to solve this problem using DAX.
The solution isn’t trivial, but it is cool.
We have said that a NULL value for s.DateOfSale is not in the range we are interested in. This means the rows with NULLs in the s.DateOfSale column (our employees yet to make a sale) will be filtered out. It will also filter out employees with sales in months other than March. We have converted the LEFT JOIN into an INNER JOIN.
James’s fix is to move the filter to the join clause, which eliminates the implicit inner join. When I see a condition like this in a code review, the first question on my mind is whether the correct fix is James’s fix or whether the developer really meant to do an inner join. There’s a potential performance gain from using an inner join over a left outer join (due to being able to drive from either table and thus having a larger number of potential execution plans) if it turns out you really do want to filter all rows and not just making the join criterion more specific.
I stumbled upon this MSDN blog post: SQL Server 2016 SSIS Data Flow Buffer Auto Sizing capability benefits data loading on Clustered Columnstore tables (catchy title). It explains how you can set the buffer properties of the data flow to try to insert data directly into compressed row groups instead of in the delta store. They fail to achieve this using SSIS 2014 and then they explain how using the new AutoAdjustBufferSize property of SSIS 2016 works miracles and everything is loaded directly into compressed row groups. Hint: you want to avoid loading data into the delta store, as it is row storage and you need to wait for the tuple mover to load the data to the CCI in the background.
However, it’s still possible to achieve the same using SSIS 2014 (or earlier). Niko Neugebauer (blog |twitter) shows this in his post Clustered Columnstore Indexes – part 51 (“SSIS, DataFlow & Max Buffer Memory”). It still depends on the estimated row size, but using these settings you should get better results:
This advice is a bit different from loading standard rowstore-based tables, but serves to pack as many rows into each columnstore row group as possible.
For this specific test, with a specific data size, distribution, and number of parameters, and on my particular hardware, JSON was a consistent winner (though marginally so). For some of the other tests in previous posts, though, other approaches fared better. Just an example of how what you’re doing and where you’re doing it can have a dramatic impact on the relative efficiency of various techniques, here are the things I’ve tested in this brief series, with my summary of which technique to use in that case, and which to use as a 2nd or 3rd choice (for example, if you can’t implement CLR due to corporate policy or because you’re using Azure SQL Database, or you can’t use JSON or
STRING_SPLIT()because you aren’t on SQL Server 2016 yet). Note that I didn’t go back and re-test the variable assignment and
SELECT INTOscripts using TVPs – these tests were set up assuming you already had existing data in CSV format that would have to be broken up first anyway. Generally, if you can avoid it, don’t smoosh your sets into comma-separated strings in the first place, IMHO.
That’s a rather interesting result, given how poorly JSON fared in some of the previous tests.