Press "Enter" to skip to content

Curated SQL Posts

BimlExpress

Catherine Wilhelmsen introduces us to BimlExpress:

BimlExpress is a free Visual Studio add-in for working with Biml in your SSIS projects. It allows you to add and edit Biml files, generate SSIS packages from Biml, and the code editor is fully featured with syntax coloring, error highlighting and intellisense.

If you are already using BIDS Helper, you will see that BimlExpress works the same way and includes all the same Biml features as in BIDS Helper – just with a new and improved code editor. No more squiggly red lines, yay!

For normal Biml purposes, I see some benefit.  But to me, the biggest benefit is when presenting:  a common question which comes up during Biml talks is, “Why does Intellisense say you have so many errors?”  A superior code editor is well worth the download.

Comments closed

Transforming Cursors

Mickey Stuewe has a post in which she transforms a cursor into a set-based procedure:

His approach was to use a cursor to cycle through all the columns in the provided table, analyze each column, determine the new data type, and store the information in a table variable. After the cursor was completed, the data in the table variable was written to a permanent table for the next process to use.

This approach isn’t necessarily bad. If you are only running it infrequently and you needed to write this stored procedure quickly, then it’s fine. But if this type of stored procedure needs to be run frequently, then it should be rewritten.

Set-based code tends to be easier to read and more compact than cursors, so even without the performance improvements they bring, there are benefits.

Comments closed

DISTINCT Windows

Daniel Hutmacher looks at ways to get windows of distinct elements from a table:

Probably the most common distinct aggregate is COUNT(DISTINCT). In some respects, it’s similar to the windowed function DENSE_RANK(). DENSE_RANK(), as opposed to ROW_NUMBER(), will only increment the row counter when the ordering column(s) actually change from one row to the next, meaning that we can use DENSE_RANK() as a form of windowed distinct count

This is a very interesting approach to the problem.  Read the whole thing.

Comments closed

Power BI: Dynamic Chart Titles

Chris Webb shows how to generate dynamic chart titles using Power BI:

But what about if you want the chart title to change depending on what is selected? For example, you might be using slicers or filters to allow a user to choose which days of the week they want to see data for. In that situation you might want to add a title that shows which days of the week have actually been selected; this would be particularly important if the report uses filters, or if the report is going to be printed. Unfortunately the built in Title Text property can’t be used to display dynamic values but in this blog post I’ll show you how to solve this problem using DAX.

The solution isn’t trivial, but it is cool.

Comments closed

“Broken” Left Joins

James Anderson reminds you to check those WHERE clauses:

We have said that a NULL value for s.DateOfSale is not in the range we are interested in. This means the rows with NULLs in the s.DateOfSale column (our employees yet to make a sale) will be filtered out. It will also filter out employees with sales in months other than March. We have converted the LEFT JOIN into an INNER JOIN.

James’s fix is to move the filter to the join clause, which eliminates the implicit inner join.  When I see a condition like this in a code review, the first question on my mind is whether the correct fix is James’s fix or whether the developer really meant to do an inner join.  There’s a potential performance gain from using an inner join over a left outer join (due to being able to drive from either table and thus having a larger number of potential execution plans) if it turns out you really do want to filter all rows and not just making the join criterion more specific.

Comments closed

Clustered Columnstore Index Load With SSIS

Koen Verbeeck looks at loading a clustered columnstore index using SSIS:

I stumbled upon this MSDN blog post: SQL Server 2016 SSIS Data Flow Buffer Auto Sizing capability benefits data loading on Clustered Columnstore tables (catchy title). It explains how you can set the buffer properties of the data flow to try to insert data directly into compressed row groups instead of in the delta store. They fail to achieve this using SSIS 2014 and then they explain how using the new AutoAdjustBufferSize property of SSIS 2016 works miracles and everything is loaded directly into compressed row groups. Hint: you want to avoid loading data into the delta store, as it is row storage and you need to wait for the  tuple mover to load the data to the CCI in the background.

However, it’s still possible to achieve the same using SSIS 2014 (or earlier). Niko Neugebauer (blog |twitter) shows this in his post Clustered Columnstore Indexes – part 51 (“SSIS, DataFlow & Max Buffer Memory”). It still depends on the estimated row size, but using these settings you should get better results:

This advice is a bit different from loading standard rowstore-based tables, but serves to pack as many rows into each columnstore row group as possible.

Comments closed

More On String_Split

Aaron Bertrand has another update on the String_Split function, specifically how it compares to user-defined table types:

For this specific test, with a specific data size, distribution, and number of parameters, and on my particular hardware, JSON was a consistent winner (though marginally so). For some of the other tests in previous posts, though, other approaches fared better. Just an example of how what you’re doing and where you’re doing it can have a dramatic impact on the relative efficiency of various techniques, here are the things I’ve tested in this brief series, with my summary of which technique to use in that case, and which to use as a 2nd or 3rd choice (for example, if you can’t implement CLR due to corporate policy or because you’re using Azure SQL Database, or you can’t use JSON or STRING_SPLIT() because you aren’t on SQL Server 2016 yet). Note that I didn’t go back and re-test the variable assignment and SELECT INTO scripts using TVPs – these tests were set up assuming you already had existing data in CSV format that would have to be broken up first anyway. Generally, if you can avoid it, don’t smoosh your sets into comma-separated strings in the first place, IMHO.

That’s a rather interesting result, given how poorly JSON fared in some of the previous tests.

Comments closed

The Importance Of Integration Testing

Michael Bourgon shows an example of why integration testing is important:

We are in process of doing a migration from an ancient creaky server to a shiny new VM.  Rather than just rebuild it and restore everything, we’re taking the (painful) opportunity to clean things up and improve several systems.

As part of this, we’re replicating data from the old server to the new server, so that we can migrate processes piecemeal, so that rollback is not “OH CRAP TURN IT OFF TURN IT OFF ROLL BACK TO THE OLD SERVER”.

But we ran into a weird problem.  On the target server, we had a many-to-many table that sits between, let’s say, stores and orders.  We have a stores table, we have an orders table, and this one (call it STORE_ORDERS for simplicity) is just a linking table between the two.  ID, stores_id, orders_id.  Everything scripted identically between the two databases (aside from the NOT FOR REPLICATION flag).

This is a case where action A works fine and action B works fine, but the combination of actions A and B leads to sadness.

Comments closed

Powershell Remoting

Andrew Pruski demonstrates Powershell remoting:

Hey guys, differing from usual this is a quick post on setting up powershell remote sessions. I know you can remotely connect to powershell sessions using the Server Manager that comes with Windows Remote Administration Tools but it’s a bit of a clicky process and I like to eliminate using the mouse as much as possible.

Disclaimer! I’m not a scripter, there are probably much better ways of doing this but I’ll show you the way I set it up and how to fix any errors you may come across.

If you’re using Remote Desktop to connect to servers, especially for regular actions, you should definitely check out Powershell remoting.

Comments closed

Azure SQL Database Pricing

James Serra explains Azure SQL Database pricing:

DTU’s are explained at here.  To help, there is a Azure SQL Database DTU Calculator.  This calculator will help you determine the number of DTUs being used for your existing on-prem SQL Server database(s) as well as a recommendation of the minimum performance level and service tier that you need before you migrate to Azure SQL Database.  It does this by using performance monitor counters.

After you use a SQL Database for a while, you can use a pricing tier recommendation tool to determine the best service tier to switch to.  It does this by assessing historical resource usage for a SQL database.

For further information, check out this interesting article from a few months ago on V12 performance by Chris Bailiss.

Comments closed