Using DMVs To Plan Out Your Indexes

Eric Blinn explains how to use two particular DMVs to see which index changes you might want to make:

Missing Indexes

This group of DMVs records every scan and large key lookups.  When the optimizer declares that there isn’t an index to support a query request it generally performs a scan.  When this happens a row is created in the missing index DMV showing the table and columns that were scanned.  If that exact same index is requested a second time, by the same query or another similar query, then the counters are increased by 1.  That value will continue to grow if the workload continues to call for the index that doesn’t exist.  It also records the cost of the query with the table scan and a suspected percentage improvement if only that missing index had existed.  The below query calculated those values together to determine a value number.

Click through for sample scripts for this and the index usage stats DMV.  The tricky part is to synthesize the results of these DMVs into the minimum number of viable indexes.  Unlike the optimizer—which is only concerned with making the particular query that ran faster—you have knowledge of all of the queries in play and can find commonalities.

What Update Locks Do

Guy Glantser explains the process around updating data in SQL Server, particularly the benefit of having update locks:

In order to update a row, SQL Server first needs to find that row, and only then it can perform the update. So every UPDATE operation is actually split into two phases – first read, and then write. During the read phase, the resource is locked for read, and then it is converted to a lock for write. This is better than just locking for write all the way from the beginning, because during the read phase, other sessions might also need to read the resource, and there is no reason to block them until we start the write phase. We already know that the SHARED lock is used for read operations (phase 1), and that the EXCLUSIVE lock is used for write operations (phase 2). So what is the UPDATE lock used for?

If we used a SHARED lock for the duration of the read phase, then we might run into a deadlock when multiple sessions run the same UPDATE statement concurrently.

Read on for more details.

More On Microsoft SQL Operations Studio

Dan Guzman shares some thoughts on Microsoft SQL Operations Studio:

Microsoft made the new cross-platform SQL Operations Studio (SOS) tool available on Github this week as a free open-source project. This SOS preview allows one to develop and manage SQL Server and Azure SQL Database from Windows, Linux, and macOS. The current preview can be downloaded from the SOS portal page, which also contains links to impressive quick start guides, how-to, and tutorials. I encourage you to try out the preview and improve it by reporting issues and offering suggestions.

If you are a developer, consider contributing to this project on Github. SOS is built on the Electron framework, which leverages JavaScript, HTML, and Node.js technologies to build rich cross-platform desktop applications. This is the same stack that the popular VS Code IDE employs so it’s not surprising SOS has a similar look and feel.

Click through for Dan’s thoughts and also a link to try it yourself.

Using The Restore-DbaDatabase Pipeline

Stuart Moore describes the updated Restore-DbaDatabase cmdlet:

The biggest change is that Restore-DbaDatabase is now a wrapper around 5 public functions. The 5 functions are:

  • Get-DbabackupInformation
  • Select-DbabackupInformation
  • Format–DbabackupInformation
  • Test–DbabackupInformation
  • Invoke-DbaAdvancedRestore

These can be used individually for advanced restore scenarios, I’ll go through some examples in a later post.

Stuart then provides additional information at the various steps, explaining at a high level how things work.

What Happens In Deep Neural Networks?

Adrian Colyer has a two-parter summarizing an interesting academic paper regarding deep neural networks.  Part one introduces the theory:

Section 2.4 contains a discussion on the crucial role of noise in making the analysis useful (which sounds kind of odd on first reading!). I don’t fully understand this part, but here’s the gist:

The learning complexity is related to the number of relevant bits required from the input patterns X for a good enough prediction of the output label Y, or the minimal I(X; \hat{X}) under a constraint on I(\hat{X}; Y) given by the IB.

Without some noise (introduced for example by the use of sigmoid activation functions) the mutual information is simply the entropy H(Y)independent of the actual function we’re trying to learn, and nothing in the structure of the points p(y|x) gives us any hint as to the learning complexity of the rule. With some noise, the function turns into a stochastic rule, and we can escape this problem. Anyone with a lay-person’s explanation of why this works, please do post in the comments!

Part two digs in deeper:

The different colours in the chart represent the different hidden layers (and there are multiple points of each colour because we’re looking at 50 different runs all plotted together). On the x-axis is I(X;T), so as we move to the right on the x-axis, the amount of mutual information between the hidden layer and the input X increases. On the y-axis is I(T;Y), so as we move up on the y-axis, the amount of mutual information between the hidden layer and the output Y increases.

I’m used to thinking of progressing through the network layers from left to right, so it took a few moments for it to sink in that it’s the lowest layer which appears in the top-right of this plot (maintains the most mutual information), and the top-most layer which appears in the bottom-left (has retained almost no mutual information before any training). So the information path being followed goes from the top-right corner to the bottom-left traveling down the slope.

This is worth a careful reading.

Troubleshooting Memory-Optimized Index Performance

Kunal Karoth has a post up on performance troubleshooting with In-Memory OLTP:

In the previous blog post In-Memory OLTP Indexes – Part 1: Recommendations, we gave you an update on the latest features of In-Memory OLTP technology. We also summarized the key characteristics of memory-optimized indexes and shared some guidelines and recommendations on how to best choose and configure an index for your memory-optimized table. At this point, if you haven’t read through the previous blog post, we strongly recommend you do so. In this blog post we continue onwards; take the learnings from the previous blog (Part 1) and using some sample examples, apply them in practice. The learnings from this blog post (Part 2) will be particularly useful if you are experiencing query performance issues with memory-optimized tables; either after migration from disk-based tables or in general, with your production workload leveraging memory-optimized tables.

To summarize this blog post covers the following:

  • Common mistakes and pitfalls to avoid when working with memory-optimized indexes.

  • Best practices to follow when configuring your memory-optimized indexes for optimal performance.

  • Troubleshooting and Mitigating your query performance issues with memory-optimized indexes.

  • Monitoring your query performance with memory-optimized indexes.

There’s a lot of detail in this post, and tuning these types of indexes isn’t quite the same as normal, disk-based indexes.

Alerting In Azure SQL Database

Arun Sirpal shows how to set up an alert for an Azure SQL Database:

I keep things simple and like to look at certain performance based metrics but before talking about what metrics are available let’s step through an example.

For this post I want to setup an alert for CPU percentage utilised that when it is greater than 50% over the last 5 minutes I would like to know about it. First step is to navigate to your Azure SQL Database.

Click through for a screenshot-driven guided tour.

Indexing Tradeoffs

Jeff Schwartz continues his series on index tuning, including a section on what happens when you have too many indexes on a table:

Larger numbers of indices create exponentially more query plan possibilities. When too many choices exist, the Optimizer will give up partway through and just pick the best plan thus far. As more indices are added the problem worsens and compilation times, i.e., processor times, increase to a point. This can be illustrated best by reviewing an actual customer example. In this case, one table had 144 indices attached to it and several others had between 20 and 130 indices. The queries were quite complex with as many as fifteen joins, many of which were outer joins. Query and index tuning were impossible because query performance was often counterintuitive and sometimes nonsensical. Adding an index that addressed a specific query need often made the query run worse one time and better the next. Note: Cached query plan issues, e.g., parameter sniffing or plan reuse were not problems in this case. The only solution was to tear down ENTIRE indexing structure and rebuild it with SQL Server’s guidance and nine days’ worth of production queries. Table 5 summarizes the results of the index restructuring project. The performance of 98 percent of the queries was comparable to or better than it was when the large numbers of indices were present.

Don’t be that company.

Connect(); Announcements, Including Azure Databricks

James Serra has a wrapup of Microsoft Connect(); announcements around the data platform space:

Microsoft Connect(); is a developer event from Nov 15-17, where plenty of announcements are made.  Here is a summary of the data platform related announcements:

  • Azure Databricks: In preview, this is a fast, easy, and collaborative Apache Spark based analytics platform optimized for Azure. It delivers one-click set up, streamlined workflows, and an interactive workspace all integrated with Azure SQL Data Warehouse, Azure Storage, Azure Cosmos DB, Azure Active Directory, and Power BI.  More info

  • Azure Cosmos DB with Apache Cassandra API: In preview, this enables Cassandra developers to simply use the Cassandra API in Azure Cosmos DB and enjoy the benefits of Azure Cosmos DB with the familiarity of the Cassandra SDKs and tools, with no code changes to their application.  More info.  See all Cosmos DB announcements

  • Microsoft joins the MariaDB Foundation: Microsoft is a platinum sponsor – MariaDB is a community of the MySQL relational database management system and Microsoft will be actively contributing to MariaDB and the MariaDB community.  More info

Click through for more.  And if you want more info on Azure Databricks, Matei Zaharia and Peter Carlin have more information:

So how is Azure Databricks put together? At a high level, the service launches and manages worker nodes in each Azure customer’s subscription, letting customers leverage existing management tools within their account.

Specifically, when a customer launches a cluster via Databricks, a “Databricks appliance” is deployed as an Azure resource in the customer’s subscription.   The customer specifies the types of VMs to use and how many, but Databricks manages all other aspects. In addition to this appliance, a managed resource group is deployed into the customer’s subscription that we populate with a VNet, a security group, and a storage account. These are concepts Azure users are familiar with. Once these services are ready, users can manage the Databricks cluster through the Azure Databricks UI or through features such as autoscaling. All metadata (such as scheduled jobs) is stored in an Azure Database with geo-replication for fault tolerance.

I’ve been a huge fan of the Databricks Community Edition.  We’ll see if there will be a Community Edition version for Azure as well.

Custom SQL Operations Studio Dashboard Widgets

Drew Furgiuele shows how easy it is to create a dashboard widget in SQL Operations Studio:

Before we go on, now’s your chance to name your widget. In my code above, I highlighted some of the changes I made. Also notice that this widget has a path to the query file we created; if this file doesn’t exist (or you can’t read from the path it exists on), it’ll stop working. Just a heads-up.

I named it something meaningful (but hey, do you), and I changed the vertical and horizontal size of the widget to be a little easier to read. Once you’re done, it’s time to add your widget to the dashboard.

I think people are going to like this product once it matures a bit.  This kind of flexibility without having to drop into .NET is great for DBAs for whom C# is a little intimidating.


November 2017
« Oct