The query runs faster, make no mistake – but check out the estimates:
- Estimated number of rows = 1
- Actual number of rows = 165,367
Those estimates are built by SQL Server’s cardinality estimator (CE), and there have been major changes to it over the last couple of versions. You can control which CE you’re using by changing the database’s compatibility level. This particular StackOverflow database is running in 2016 compat mode – so what happens if we switch it back to 2012 compat mode?
Based on this result, there might be further optimizations available. Read on for more of Brent’s thoughts.
Well obviously something somewhere is different. Start by comparing everything on both servers and both databases down to… hang on, here, we’ll write a PowerShell script and then….
Wait, wait, wait!
You have the execution plans? Before we start digging through all the properties everywhere and comparing everything to everything, updating statistics 14 times, and all the rest, what if we look at the execution plans. They’re different, so we should start looking at scans & indexes & statistics &….
Sometimes, it’s the little things that matter.
Two important criteria must be met to allow parallel execution of an INSERT … SELECT statement.
The database compatibility level must be 130. Execute “SELECT name, compatibility_level FROM sys.databases” to determine the compability level of your database, and if it is not 130, execute “ALTER DATABASE <MyDB> SET COMPATIBILITY_LEVEL = 130” to set it to 130. Changing the compatibility level of a database influences some behavior changes. You should test and ensure that your overall application works well with the new compatibility level.
Must use the TABLOCK hint with the INSERT … SELECT statement. For example: INSERT INTO table_1 WITH (TABLOCK) SELECT * FROM table_2.
This is a limited use case, but it does sound very useful for large staging table loads or backfills when you can control table access.
We’ve been busy modifying SQL Sentry Plan Explorer so it will work seamlessly with the changed add-in model in the new Management Studio. (As a bit of background, SSMS is now based on the Visual Studio 2015 shell, and some work is required to transition existing add-ins.)
I am happy to announce that, as of today, Plan Explorer 2.8 (build 10.1.94), with add-in support for SSMS 2016, is now available!
Huzzah. That’s two plug-ins down and two more to go before I’m 100% ready for SSMS 2016.
To sum up, the workflow for tuning your query is:
Make some changes to the LongQuery query that hopefully make it faster
Update the Trace Message parameter with some notes about which version of the LongQuery query it is that you’ll be testing
Click the Refresh Preview button for the Diagnostics query to test how long LongQuery now runs for
Refresh, or load, the query that reads the data from the trace logs so you can see how all of your changes have affected query execution times
I give it two months before the Power BI team releases a change to make this easier…
While presenting at SQLDay in Wroclaw, Poland, on the Query Store, I was asked a pretty simple question, which takes precedence, the Query Store or a Plan Guide?
One of my favorite answers to questions is “I don’t know” because it gives me the opportunity to learn. Let’s figure this one out together.
I’ll post the code to recreate this experiment within AdventureWorks at the end of the article. I’m doing this because the code for forcing execution plans using Plan Guides can be pretty doggone long (you may need to generate your own XML from a plan on your own system, fair warning).
The answer is not quite as clear-cut as I would have expected, and I’ll be interested to see what others find.
In order for SQL Server to read and update data, the data must be in the buffer. SQL Server does not work directly with the data in the files on disk. Once the pages of data are in the buffer, they can be used for multiple queries. This means that the data doesn’t have to be retrieved from disk every time it’s needed, thereby decreasing the amount of I/O work required.
You may have seen this yourself when selecting all the rows of a large table twice. The second time, the query can run much faster because the data does not have to be copied from the disk to the buffer. If you run another query from a different large table, it may cause the pages from the first table to be removed to make room. If there is not enough memory, pages will have to be read from disk more frequently causing your queries to be slow.
Even with extremely fast SSDs and flash storage arrays, RAM is still typically an order of magnitude faster, so having enough RAM and using it wisely is critical to a well-functioning SQL Server instance.
Letting SQL Server change data types automatically can seriously impact performance in a negative way. Because a calculation has to be run on each column, you can’t get an index seek. Instead, you’re forced to use a scan. I can demonstrate this pretty simply. Here’s a script that sets up a test table with three columns and three indexes and tosses a couple of rows in:
You might get lucky and have the database engine realize that it doesn’t need to give you a horrible execution plan, but it’s sound advice to ensure that data types match on joins and filters.
When defining table variables, avoid primary key or unique key constraints. Opt instead for named indexes if you’re using SQL Server 2014 or later. Otherwise, be aware that plan forcing is limited to queries that don’t use these table variables.
Helpful advice when dealing with user-defiened table types. Read the whole thing.
For this specific test, with a specific data size, distribution, and number of parameters, and on my particular hardware, JSON was a consistent winner (though marginally so). For some of the other tests in previous posts, though, other approaches fared better. Just an example of how what you’re doing and where you’re doing it can have a dramatic impact on the relative efficiency of various techniques, here are the things I’ve tested in this brief series, with my summary of which technique to use in that case, and which to use as a 2nd or 3rd choice (for example, if you can’t implement CLR due to corporate policy or because you’re using Azure SQL Database, or you can’t use JSON or
STRING_SPLIT()because you aren’t on SQL Server 2016 yet). Note that I didn’t go back and re-test the variable assignment and
SELECT INTOscripts using TVPs – these tests were set up assuming you already had existing data in CSV format that would have to be broken up first anyway. Generally, if you can avoid it, don’t smoosh your sets into comma-separated strings in the first place, IMHO.
That’s a rather interesting result, given how poorly JSON fared in some of the previous tests.