Press "Enter" to skip to content

Category: Indexing

Hypothetical Indexes

Kenneth Fisher discusses hypothetical indexes:

I saw something like this the other day. My first thought was “Hu, never seen that before.” My second thought was “Wow, that’s really cool. I wonder what a hypothetical index is?” A quick search later and I discovered that the DTA (database tuning adviser) uses them to test out what indexes will work best. A pretend (one might almost say hypothetical) index is created, with statistics, but without the actual index structure. Then a query plan is created allowing for that index.

This is pretty cool since creating a real index can take quite a bit of time, particularly on a really large table. It would be nice to be able to tell SQL that an index exists and try it out before actually spending the time creating it. I’d learned about a DB2 method of doing this a while back but wasn’t aware of one for SQL Server. In part that’s because it’s undocumented. Because the commands I’m going to use here are undocumented standard warnings apply.

That’s completely new to me.

Comments closed

Finding Duplicate Indexes With Biml

Ben Weissman shows how to find duplicate indexes using Biml:

This little piece of Biml will check all your tables for indices sharing the same columns.
It does not generate any SSIS tasks etc. but might be a good starting point to build your own Index-Monitoring or Index-Clinic – because Biml is NOT just for SSIS

Depending upon your definition of a duplicate index, this might generate false positives.  Regardless, it’s a nice way of showing that Biml is about more than SSIS.

Comments closed

Page Free Space On Heaps

Raul Gonzalez shows a case in which writing to a heap can be more efficient than writing to a clustered index:

Lots have been written, including myself, about the advantage and disadvantages of the different types of tables that exist in SQL Server, Clustered and Heaps.

Today I will again, because at the end of the day, a table is the most basic structure in a database and we need them to store the data, which is indeed the purpose of having a database, right?

Interesting reading.  Check it out.

Comments closed

Computed Columns And Indexes

Derik Hammer looks at computed columns with a bonus section on unique indexes (or maybe vice versa):

What can an index do that a constraint cannot?

  • Set FILL FACTOR.
  • Add non-key (INCLUDED) columns.
  • Data compression.
  • Lock management settings.

What can a constraint do that an index cannot?

The only potential benefit I was able to find, or think of, is that constraints can be disabled. If you could enable and disable a unique constraint, that could be one feature that the unique index does not. This is counter-intuitive knowing that the unique constraint is enforced with a unique index behind the scenes.

For me, the big benefit of unique constraints over unique indexes is that they provide a logical separation.  Unique constraints show up in the Keys section in Management Studio and let you say, “Yes, I made this thing because the data model requires uniqueness here.”  By contrast a unique index can tell the same thing, but could also say “for the subset of data in a filtered index, I can assume uniqueness” or “for performance reasons, this combination is unique, but its uniqueness is not important to the logical data model.”  I will happily admit that I’m splitting the hair pretty thin on this one and that in practice, there are benefits to using unique indexes across the board.

Comments closed

Rebuild Or Reorganize?

Kendra Little answers the age-old question:

With very large indexes, rebuilds take longer, generate more log, impact performance more while they’re running.

If you’re using high availability features like Availability Groups or database mirroring that stream the log between replicas, generating a lot of log data very quickly can create problems.

Your replica/ mirrors may fall behind. Depending on the latency between replicas, the size of the indexes rebuilt, and other operations in the database, they may be out of your Recovery Point Objective / Recovery Time objective for a long time.

In this situation, it’s particularly attractive to drip changes into the log more slowly. One of the ways to do this is to use REORGANIZE for those large indexes.

There’s a lot of nuance here, so give it a read (or watch the video).

Comments closed

Don’t Over-Index

Kenneth Fisher shows the downside cost of non-clustered indexes:

I want to make a couple of final points. I realize 99 indexes is a lot. It’s to emphasize the differences. However they were also fairly small indexes and this is a single table where a normal database might easily have hundreds. So take these results as an example. They aren’t going to match real life but will hopefully show you how all of this can play out.

Indexes are awesome but you want to be smart about adding them. My personal rule of thumb, with no scientific evidence behind it, is 5 indexes or less and I’m pretty easy. 5-10 indexes and you’ll have to convince me. I’m going to be reviewing the existing indexes and see what I can get rid of, or maybe I can combine something. Past 10 indexes and it had best be for a query that’s running a 100+ times a minute or something for the CEO.

Read on for demo code and specific results.

Comments closed

Nonclustered Columnstore Indexes On Indexed Views

Niko Neugebauer notes that non-clustered columnstore indexes can now sit on top of indexed views, as of SQL Server 2016:

From the perspective of the disk access, this is where you will definitely win at least a couple of times with the amount of the disk access while processing the information, amount of memory that you will need to store and process (think hashing and sorting for the late materialisation phases), and you will pay less for the occupied storage.

Another noticeable thing was that the memory grants for the Indexed Views query was smaller compared to the query that was processing the original columnstore table FactOnlineSales.

Clustered indexes are currently not available as an option; we’ll see if that changes in the next version of SQL Server.

Comments closed

Sequentially Increasing Indexes

Joe Chang discusses benchmarking and looks at a particular scenario around maximizing insert performance:

The test environment here is a single socket Xeon E3 v3, quad-core, hyper-threading enabled. Turbo-boost is disabled for consistency. The software stack is Windows Server 2016 TP5, and SQL Server 2016 cu2 (build 2164). Some tests were conducted on a single socket Xeon E5 v4 with 10 cores, but most are on the E3 system. In the past, I used to maintain two-socket systems for investigating issues, but only up to the Core2 processor, which were not NUMA.

The test table has 8 fixed length not null columns, 4 bigint, 2 guids, 1 int, and a 3-byte date. This adds up to 70 bytes. With file and row pointer overhead, this works out to 100 rows per page at 100% fill-factor.

Both heap and clustered index organized tables were tested. The indexes tested were 1) single column key sequentially increasing and 2) two column key leading with a grouping value followed by a sequentially increasing value. The grouping value was chosen so that inserts go to many different pages.

The test was for a client to insert a single row per call. Note that the recommended practice is to consolidate multiple SQL statements into a single RPC, aka network roundtrip, and if appropriate, bracket multiple Insert, Update and Delete statements with a BEGIN and COMMIT TRAN. This test was contrived to determine the worst case insert scenario.

With that setup in mind, click through to learn his results.

Comments closed

Aborting Index Rebuilds

Arun Sirpal shows how to use the ABORT_AFTER_WAIT attribute on an index rebuild command:

Looking into the locking you will see that ONLINE operation uses (Sch-M) on the corresponding table as part of the process (actually takes Shared Table Lock (S) at the beginning of the operation, and a Schema Modification Lock (Sch-M at the end)).

So to be granted a SCH-M lock you can’t have any conflicting locks, so what happens when / if you have a process that is updating the table and you want to use the ONLINE rebuild? Yes you will be blocked. With 2014 onwards we can control what happens if we get into this situation and for this post I am going to abort the other query causing me to wait.

Not sure I like the “Kick the other guy(s) off” part that much, but I can see uses.  It’s probably more likely to go the opposite route, cancelling the rebuild if the server’s too hot.

Comments closed

Using Statistics For Index Design

Kendra Little argues that you should not use automatically created statistics as a guide for index creation:

We’ve talked a lot so far about how much statistics and indexes are related. This is why it seems like statistics might be useful for designing indexes!

But here’s the thing — SQL Server doesn’t track and report on how many times a statistic was used during optimization.

This is an interesting discussion.

Comments closed