Press "Enter" to skip to content

Category: Administration

Plan Cache Sizes

Erin Stellato shares information on plan cache sizes:

If you have an ad hoc workload, you’ll often hit the max number of entries before you hit the space limit, particularly if you have the optimize for ad hoc workloads server option enabled, which stores the plan stub for an ad hoc query on initial execution, rather than the entire plan (the plan stub consumes less space, so this is an attempt to reduce bloat).

Click through for the numbers and a couple of options you have around plan cache size.

Comments closed

More on Index Fragmentation

Tibor Karaszi revises and extends some remarks on index fragmentation:

In my last blog post, I wanted to focus on the sequential vs random I/O aspect and how that part should be more or less irrelevant with modern hardware. So I did a test that did a full scan (following the linked list of an index) and see if I could notice any performance difference on my SSD. I couldn’t.

That isn’t the end of the story, it turns out. Another aspect is how the data is brought into memory. You might know that SQL server can do “read ahead”, meaning it does larger reads per I/O instead if single-page I/O. I.e., fewer but larger I/O operations. This sounds fine, but what happens when we have fragmentation?

Read on for a situation in which fragmentation does matter.

Comments closed

SQL Server Settings Blade in Azure

Dave Bermingham notes a recent change to the Azure Portal when creating a new VM with SQL Server pre-installed:

As you slide the IOPS slider to the right you will see the number of data disks increase, the Storage Size increase, and the Throughput increase. You will be limited to the max number of IOPS and disks supported by that instance size. You see in the screenshot below I am able to go as high as 80,000 IOPS when provisioning storage for a Standard E64-16s_v3 instance.

It sounds like they did a pretty good job of things there.

Comments closed

The Cost of Ad Hoc Queries

Erin Stellato gives us a simple demonstration of why parameterization is important for performance:

From this screenshot you can see that we have about 3GB total dedicated to the plan cache, and of that 1.7GB is for the plans of over 158,000 adhoc queries. Of that 1.7GB, approximately 500MB is used for 125,000 plans that execute ONE time only. About 1GB of the plan cache is for prepared and procedure plans, and they only take up about 300MB worth of space. But note the average use count – well over 1 million for procedures. In looking at this output, I would categorize this workload as mixed – some parameterized queries, some adhoc.

Kimberly’s blog post discusses options for managing a plan cache filled with a lot of adhoc queries. Plan cache bloat is just one problem you have to contend with when you have an adhoc workload, and in this post I want to explore the effect it can have on CPU as a result of all the compilations that have to occur. When a query executes in SQL Server, it goes through compilation and optimization, and there is overhead associated with this process, which frequently manifests as CPU cost. Once a query plan is in cache, it can be re-used. Queries that are parameterized can end up re-using a plan that’s already in cache, because the query text is exactly the same. When an adhoc query executes it will only re-use the plan in cache if it has the exact same text and input value(s).

Read on to see an example of how long it takes a set of ad hoc queries to finish versus their parameterized equivalents. Erin’s test is at the behavioral extreme (100% parameterized versus 100% ad hoc) so real-world results won’t be quite this good.

Comments closed

Azure SQL Database and Extended Events

Dave Bland shows how to set up and read an extended event file on Azure SQL Database:

This first step when using T-SQL to read Extended Files that are stored in an Azure Storage Account is to create a database credential.  Of course the credential will provide essential security information to connect to the Azure Storage Account.  This first data point you will need is the URL to a blog storage container in you storage account.  If you look below, you can see where you would place your storage account name and the blob storage container name.

Dave gives us the grand tour of the configuration process, including where things differ between on-prem SQL Server and Azure SQL Database (which is quite a bit)

Comments closed

Collecting Hadoop Metrics from Multiple Clusters

Dmitry Tolpeko shows how you can collate Hadoop metrics from several ElasticMapReduce clusters:

The first step is to dynamically get the list of clusters and their IPs. Hadoop clusters are often reprovisioned, added and terminated, so you cannot use the static list and addresses. In case of Amazon EMR, you can use the following Linux shell command to get the list of active clusters:

aws emr list-clusters --active

From its output you can get the cluster IDs and names. As a cluster ID and IP can change over time, its name is usually permanent (like DEV or Adhoc-Analytics cluster) so it can be useful for various aggregation reports.

Read on to see what you can do with this list of clusters.

Comments closed

Distributed Computing Fallacies

Samir Behara takes us through a few fallacies with distributed computing:

The network is reliable
Service calls made over the network might fail. There can be congestion in network or power failure impacting your systems. The request might reach the destination service but it might fail to send the response back to the primary service. The data might get corrupted or lost during transmission over the wire. While architecting distributed cloud applications, you should assume that these type of network failures will happen and design your applications for resiliency.

To handle this scenario, you should implement automatic retries in your code when such a network error occurs. Say one of your services is not able to establish a connection because of a network issue, you can implement retry logic to automatically re-establish the connection.

There are some very good points in here.

Comments closed

Finding Three-Part and Four-Part Names

Pamela Mooney shows how you can find three-part or four-part naming on a SQL Server instance:

The script below searches the metadata for views, sprocs and functions for occurrences of 3 and 4 part names.  Three-part names consist of databasename.schemaname.objectname, and four-part names consist of servername.databasename.schemaname.objectname. Because the code searches metadata, it isn’t always perfect.  If your comments mention a servername followed by a period, for example, it will be caught.  Nevertheless, it’s a great place to begin looking, and a real help in getting rid of problems before they really bite you.

Click through for the script.

Comments closed