Press "Enter" to skip to content

Author: Kevin Feasel

Clustered Columnstore Index Load With SSIS

Koen Verbeeck looks at loading a clustered columnstore index using SSIS:

I stumbled upon this MSDN blog post: SQL Server 2016 SSIS Data Flow Buffer Auto Sizing capability benefits data loading on Clustered Columnstore tables (catchy title). It explains how you can set the buffer properties of the data flow to try to insert data directly into compressed row groups instead of in the delta store. They fail to achieve this using SSIS 2014 and then they explain how using the new AutoAdjustBufferSize property of SSIS 2016 works miracles and everything is loaded directly into compressed row groups. Hint: you want to avoid loading data into the delta store, as it is row storage and you need to wait for the  tuple mover to load the data to the CCI in the background.

However, it’s still possible to achieve the same using SSIS 2014 (or earlier). Niko Neugebauer (blog |twitter) shows this in his post Clustered Columnstore Indexes – part 51 (“SSIS, DataFlow & Max Buffer Memory”). It still depends on the estimated row size, but using these settings you should get better results:

This advice is a bit different from loading standard rowstore-based tables, but serves to pack as many rows into each columnstore row group as possible.

Comments closed

More On String_Split

Aaron Bertrand has another update on the String_Split function, specifically how it compares to user-defined table types:

For this specific test, with a specific data size, distribution, and number of parameters, and on my particular hardware, JSON was a consistent winner (though marginally so). For some of the other tests in previous posts, though, other approaches fared better. Just an example of how what you’re doing and where you’re doing it can have a dramatic impact on the relative efficiency of various techniques, here are the things I’ve tested in this brief series, with my summary of which technique to use in that case, and which to use as a 2nd or 3rd choice (for example, if you can’t implement CLR due to corporate policy or because you’re using Azure SQL Database, or you can’t use JSON or STRING_SPLIT() because you aren’t on SQL Server 2016 yet). Note that I didn’t go back and re-test the variable assignment and SELECT INTO scripts using TVPs – these tests were set up assuming you already had existing data in CSV format that would have to be broken up first anyway. Generally, if you can avoid it, don’t smoosh your sets into comma-separated strings in the first place, IMHO.

That’s a rather interesting result, given how poorly JSON fared in some of the previous tests.

Comments closed

The Importance Of Integration Testing

Michael Bourgon shows an example of why integration testing is important:

We are in process of doing a migration from an ancient creaky server to a shiny new VM.  Rather than just rebuild it and restore everything, we’re taking the (painful) opportunity to clean things up and improve several systems.

As part of this, we’re replicating data from the old server to the new server, so that we can migrate processes piecemeal, so that rollback is not “OH CRAP TURN IT OFF TURN IT OFF ROLL BACK TO THE OLD SERVER”.

But we ran into a weird problem.  On the target server, we had a many-to-many table that sits between, let’s say, stores and orders.  We have a stores table, we have an orders table, and this one (call it STORE_ORDERS for simplicity) is just a linking table between the two.  ID, stores_id, orders_id.  Everything scripted identically between the two databases (aside from the NOT FOR REPLICATION flag).

This is a case where action A works fine and action B works fine, but the combination of actions A and B leads to sadness.

Comments closed

Powershell Remoting

Andrew Pruski demonstrates Powershell remoting:

Hey guys, differing from usual this is a quick post on setting up powershell remote sessions. I know you can remotely connect to powershell sessions using the Server Manager that comes with Windows Remote Administration Tools but it’s a bit of a clicky process and I like to eliminate using the mouse as much as possible.

Disclaimer! I’m not a scripter, there are probably much better ways of doing this but I’ll show you the way I set it up and how to fix any errors you may come across.

If you’re using Remote Desktop to connect to servers, especially for regular actions, you should definitely check out Powershell remoting.

Comments closed

Azure SQL Database Pricing

James Serra explains Azure SQL Database pricing:

DTU’s are explained at here.  To help, there is a Azure SQL Database DTU Calculator.  This calculator will help you determine the number of DTUs being used for your existing on-prem SQL Server database(s) as well as a recommendation of the minimum performance level and service tier that you need before you migrate to Azure SQL Database.  It does this by using performance monitor counters.

After you use a SQL Database for a while, you can use a pricing tier recommendation tool to determine the best service tier to switch to.  It does this by assessing historical resource usage for a SQL database.

For further information, check out this interesting article from a few months ago on V12 performance by Chris Bailiss.

Comments closed

Echoing Variable Values

Bill Fellows shows how to send informational messages in SSIS and gives an example in Biml:

To aid in debugging, it’s helpful to have a “flight recorder” running to show you the state of variables. When I was first learning to program, the debugger I used was a lot of PRINT statements. Verify your inputs before you assume anything is a distillation of my experience debugging.

While some favor using MessageBox, I hate finding the popup window, closing it and then promptly forgetting what value was displayed. In SSIS, I favor raising events, FireInformation specifically, to emit the value of variables. This approach allows me to see the values in both the Progress/Execution Results tab as well as the Output window.

There’s value in putting in code like this as part of generic processing.  Flip the debug bit to true whenever you need this detailed information.  You can also think about calling the method multiple times, such as before and after an expected change block.

Comments closed

Power BI KPIs And Colorblindness

Meagan Longoria points out that the choice of red and green for KPIs isn’t ideal:

And which colors do we love to use with KPIs? Red and green, of course! Color is a very powerful tool in data viz. We use it to indicate meaning and to draw attention to something important. KPI boxes are used to display key metrics in an efficient manner. These key metrics are usually rather important, and our users need to be able to see their status at a glance.

I quite like the design of the KPI boxes in Power BI, but for some reason they were created without the ability to adjust the color associated with the state (good/bad). Shown below, they use the common red/green color scheme.

It sounds like Microsoft is already working on fixing the issue.

Comments closed

Where Was I?

Shane O’Neill shows how to see the queries which were run in Management Studio:

Suddenly you’re not sure if you really ran the SELECT statement at all.
Maybe you ran the insert statement and 2089 rows were marked to never be seen again!
Or maybe that other table only had 2089 rows in it and you’ve now deleted every one!!

Now this blog post is not going to deal with fail-safe’s for preventing those scenarios because 1) you should already know how to do that, and b) if you don’t know, then maybe back away until you research it… It’s only going to deal with a nice little way you can figure out what it was that you just ran.

I don’t think this will go into my everyday processes, but it’s handy to have when you absolutely need to make sure you’re running the correct line in a script.

Comments closed

Messing With Statistics

Erik Darling shows how to fake stats:

One thing I’ve found is that the inflated counts don’t seem to change anything for Identities, or Primary Keys. You’ll always get very reasonable plans and estimates regardless of how high you set row and page counts for those. Regular old clustered indexes are fair game.

Some really interesting things can start to happen to execution plans when SQL thinks there’s this many rows in a table. The first is that SQL will use a rare (in my experience) plan choice: Index Intersection. You can think of this like a Key Lookup but with two nonclustered indexes rather than from one nonclustered index to the clustered index.

This is very useful when you don’t have many rows in dev, can’t put many rows in dev, and can’t restore a stats-only database from prod.

Comments closed

More On String Splitting

Aaron Bertrand has a follow-up post on STRING_SPLIT():

So here, the JSON and STRING_SPLIT methods took about 10 seconds each, while the Numbers table, CLR, and XML approaches took less than a second. Perplexed, I investigated the waits, and sure enough, the four methods on the left incurred significant LATCH_EX waits (about 25 seconds) not seen in the other three, and there were no other significant waits to speak of.

And since the latch waits were greater than total duration, it gave me a clue that this had to do with parallelism (this particular machine has 4 cores). So I generated test code again, changing just one line to see what would happen without parallelism:

There’s a lot going on in that post, so I recommend checking it out.

Comments closed