Sunil Agarwal has a two-part series on columnstore data elimination. First up is column elimination:
Now, let us run the same query on the table with clustered columnstore index as shown in the picture below. Note, that the logical IOs for the LOB data is reduced by 3/4th for the second query as only one column needs to be fetched. You may wonder why LOB? Well, the data in each column is compressed and then is stored as BLOB. Another point to note is that the query with columnstore index runs much faster, 25x for the first query and 4x for the second query.
In the context of rowgroup elimination, let us revisit the previous example with sales data
- You may not even need partitioning to filter the rows for the current quarter as rows are inserted in the SalesDate order allowing SQL Server to pick the rowgroups that contain the rows for the requested date range.
- If you need to filter the data for a specific region within a quarter, you can partition the columnstore index at quarterly boundary and then load the data into each partition after sorting on the region. If the incoming data is not sorted on region, you can follow the steps (a) switch out the partition into a staging table T1 (b) drop the clustered columnstore index (CCI) on the T1 and create clustered btree index on T1 on column ‘region’ to order the data (c) now create the CCI while dropping the existing clustered index. A general recommendation is to create CCI with DOP=1 to keep the prefect ordering.
From these two articles, queries which hit a small percentage of columns and stick to a relatively small number of rowgroups will likely perform better. For people who understand normal B-tree indexes, the second point seems clear enough, but the first point is at least as important.
Balanced power mode has a major impact on SQL Server performance. Simply put, you’ve told Windows Server (through Control Panel) or your server hardware (through BIOS settings) to sacrifice a few performance-minded things for the sake of using a little less power. In fact, in plenty of studies and blog posts by folks in the community (including this post by Glenn Berry), you can see the effect of CPU power saving, especially. In essence, the CPUs will run at a lower clock multiplier when demand isn’t deemed high enough, and that clock speed will only increase when demand is high enough for long enough. This results in a slower CPU speed during normal operations. That setting actually works well on my laptop or tablet when I want to conserve battery life and don’t have a workload that is sensitive to CPU speed.
For a SQL Server, though? That is horrible for performance. Windows balanced mode also can cause other components to run slower or behave differently than when the server is running in High Performance mode. For instance, USB ports can be underpowered and network interfaces can be under-powered or even go to sleep. Frankly, for a SQL Serve, nothing good comes out of these modes.
Read on for a few methods for checking whether your servers are affected.
By leveraging Spark for distribution, we can achieve the same results much more quickly and with the same amount of code. By keeping data in HDFS throughout the process, we were able to ingest the same data as before in about 36 seconds. Let’s take a look at Spark code which produced equivalent results as the bash script shown above — note that a more parameterized version of this code code and of all code referenced in this article can be found down below in the Resources section.
Read the whole thing.
Instantly it is apparent that the most resource intensive query was the same query across both workload tests and note that the query hash is consistent too. It is also apparent that this query performs worse under the new cardinality estimator model version 120. To investigate and understand why this particular query behaves differently under the different cardinality estimators we’ll need to look at the actual query and the execution plans.
Looking at the information in #TempCEStats and the execution plans, the problematic query below belongs to the SLEV stored procedure.
There’s also a discussion of Query Store in there, but it’s important to understand how to figure this out even if you’re on 2014 and don’t have access to Query Store.
DBCC OPTIMIZER_WHATIF can be used to pull down your resources or augment them. Often the differences in the execution plans have to do with parallelism and memory grants. This is an example of an execution plan running on an under powered development machine.
This is a good tool to help figure out what an execution plan probably would look like in production when your test environment is much smaller.
The results are fun.
Execution Type Average Duration sp_executesql AVG: 57946.03187251 Ad Hoc AVG: 14788.8924302789
What’s going on? Is the conclusion that, in fact, ad hoc queries are faster than sp_executesql?
Using sp_executesql is about protecting yourself when executing dynamic SQL, not about performance gains.
So here are two different ways that we can use SQL Servers wait statistics for troubleshooting purposes. Both views give us really useful information but both have different purposes. If we wanted to look back over time then the sys.dm_os_wait_stats will give us a view of wait time totals. Typically we would capture the information via a scheduled job and analyse the data for spikes during periods where issues might be suspected.
For performing real-time analysis of wait statistics then we should base queries on the sys.dm_os_waiting_tasks view where we can see accurate wait duration values as they are happening within our instance.
In my opinion wait statistics are the most important piece of information when troubleshooting SQL Server so learning about the different types is vital for anyone using SQL. Thankfully there is a wealth of really useful information about wait statistics out there; I’ve listed some of my favourite posts below.
Click through for an example, as well as links to more resources.
The primary method in which data moves from one process to another is through buffers. We break up data into smaller portions and push them to their destination. In Integration Services, we have buffers. When passing data through TCP, we use packets.
Okay, so what’s the trade-off? The trade-off is between latency and throughput. Let’s take TCP packets as an example. Say you have a series of 50-byte messages you want to send from a source to a destination. We have two primary options: push messages as fast as possible, or hold off until you have the most data you can store in a packet and send it along. For simplicity’s sake, we’ll say that we can fit about 1350 bytes in a packet, so we can store 27 messages in a packet. We’ll also assume that it takes 10 milliseconds to send a packet from the source to the destination (regardless of packet size, as we’re using powerful connections) and 1 millisecond to produce a message.
We use pipes as metaphors in IT, especially around data transfer, and I think it’s a solid metaphor because it intuitively includes most of the important concepts we need to worry about with data. We have latency (how long it takes something to go from one end of the pipe to the other), throughput (how much we can move at any point in time, which is determined by things like the diameter of the pipe), back pressure (in the pipe scenario, resistance caused by pipe directional changes; in the data world, when downstream operators are slower than upstream operators), etc.
The difference in the performance including compile time for the procedure alone is 700mc better on average than the view. That’s an 8% difference. It was almost that high for the view that used the procedure at 7%.
If we’re just talking compile time then, there is a significant win if we avoid the view. This is no doubt because of the extra work involved in unpacking the view and going through the simplification process within the optimizer. Plus, the view alone in our query was parameterized by the optimizer in order to assist it’s performance over time (as we saw in the average results without the recompile). All that extra work explains the 8% difference.
Read the whole thing.
I’ve written before about the concept of cargo cult data professionals. They see one issue, one time, and consequently extrapolate that to all issues, all the time. It’s the best explanation I have for why someone would suggest that a sub-query is flat out wrong and will hurt performance.
Let me put a caveat up front (which I will reiterate in the conclusion, just so we’re clear), there’s nothing magically good about sub-queries just like there is nothing magically evil about sub-queries. You can absolutely write a sub-query that performs horribly, does horrible things, runs badly, and therefore absolutely screws up your system. Just as you can with any kind of query. I am addressing the bad advice that a sub-query is to be avoided because they will inherently lead to poor performance.
There are times not to use subqueries, but this post is absolutely correct: understand the reasons why things may or may not perform well, and don’t be afraid to try things out.