Press "Enter" to skip to content

Category: Administration

Renaming Servers

Steve Jones shows how to rename a server using Powershell:

I’m sure many people are experimenting with VMs and SQL Server. If you’re like me, many of you just default to installing Windows 7/10 or Windows Server xx Standard for your testing. Those systems work fine, but I’ve been trying to build slimmer systems, which means looking at Server Core. Installing Server Core is much the same as other versions, though you end up with only a command line. If you’re like me, using VMWare, you also might end up with a server name like “WIN-LKR3R4FfL5T”.

I want to change that. It’s a fine name if I’m working locally. It’s not to much fun connecting across a network. This post looks at how to rename that machine.

This is probably a good idea to do before installing any major software.  Renaming a server under SQL Server is possible, but there are a few extra steps to the process.

Comments closed

Resource Governor

Mark Wilkinson walks through Resource Governor:

Now that we know which statistics we can gather, we need to actually start gathering them. While the DMVs for Resource Governor are great, they will only give you an aggregate of the usage information since the last time the statistics were reset, or the last time services were restarted.

In most cases it makes sense to store your data in a separate table so you can calculate differentials between two time periods. For our example we are only going to be interested in request counts and CPU usage. For this, we will create the following table

Read the whole thing.

Comments closed

Blocking Notifications

Kendra Little shows how to set up blocking and deadlock notifications using base SQL Server components:

OK, we’ve got notifications. We need SQL Server to give us more information on who is involved in the blocking.

I like to use the built-in Blocked Process Report for this. This has been in SQL Server for a long time, and it’s extremely useful.

The Blocked Process Report shows you the “input buffer” of the commands involved – it may be partial information and not the full text of the query. It will also show you the login name for who is running what, and the type of lock requests involved.

You don’t have to spend extra money to get good diagnostic information, at least about these items.

Comments closed

Monitoring Elasticsearch Performance

Emily Chang has a big, four-part series on monitoring Elasticsearch performance.  Part 1 is a nice introduction to Elasticsearch and important metrics out of the box:

The three most common types of nodes in Elasticsearch are:

  • Master-eligible nodes: By default, every node is master-eligible unless otherwise specified. Each cluster automatically elects a master node from all of the master-eligible nodes. In the event that the current master node experiences a failure (such as a power outage, hardware failure, or an out-of-memory error), master-eligible nodes elect a new master. The master node is responsible for coordinating cluster tasks like distributing shards across nodes, and creating and deleting indices. Any master-eligible node is also able to function as a data node. However, in larger clusters, users may launch dedicated master-eligible nodes that do not store any data (by adding node.data: false to the config file), in order to improve reliability. In high-usage environments, moving the master role away from data nodes helps ensure that there will always be enough resources allocated to tasks that only master-eligible nodes can handle.

  • Data nodes: By default, every node is a data node that stores data in the form of shards (more about that in the section below) and performs actions related to indexing, searching, and aggregating data. In larger clusters, you may choose to create dedicated data nodes by addingnode.master: false to the config file, ensuring that these nodes have enough resources to handle data-related requests without the additional workload of cluster-related administrative tasks.

  • Client nodes: If you set node.master and node.data to false, you will end up with a client node, which is designed to act as a load balancer that helps route indexing and search requests. Client nodes help shoulder some of the search workload so that data and master-eligible nodes can focus on their core tasks. Depending on your use case, client nodes may not be necessary because data nodes are able to handle request routing on their own. However, adding client nodes to your cluster makes sense if your search/index workload is heavy enough to benefit from having dedicated client nodes to help route requests.

Part 2 shows how to collect metrics using various APIs:

The Node Stats API is a powerful tool that provides access to nearly every metric from Part 1, with the exception of overall cluster health and pending tasks, which are only available via the Cluster Health API and the Pending Tasks API, respectively. The command to query the Node Stats API is:

curl localhost:9200/_nodes/stats

The output includes very detailed information about every node running in your cluster. You can also query a specific node by specifying the ID, address, name, or attribute of the node. In the command below, we are querying two nodes by their names, node1 and node2 (node.name in each node’s configuration file):

curl localhost:9200/_nodes/node1,node2/stats

Each node’s metrics are divided into several sections, listed here along with the metrics they contain from Part 1.

Part 3 is a brief for using Datadog for metrics collection and display:

The Datadog Agent is open source software that collects and reports metrics from each of your nodes, so you can view and monitor them in one place. Installing the Agent usually only takes a single command. View installation instructions for various platforms here. You can also install the Agent automatically with configuration management tools like Chef orPuppet.

Part 4 walks through some common Elasticsearch performance issues:

How to solve 5 Elasticsearch performance and scaling problemsseries /

This post is the final part of a 4-part series on monitoring Elasticsearch performance. Part 1 provides an overview of Elasticsearch and its key performance metrics, Part 2 explains how to collect these metrics, and Part 3 describes how to monitor Elasticsearch with Datadog.

Like a car, Elasticsearch was designed to allow its users to get up and running quickly, without having to understand all of its inner workings. However, it’s only a matter of time before you run into engine trouble here or there. This article will walk through five common Elasticsearch challenges, and how to deal with them.

Problem #1: My cluster status is red or yellow. What should I do?

es-cluster-status.png

If you recall from Part 1, cluster status is reported as red if one or more primary shards (and its replicas) is missing, and yellow if one or more replica shards is missing. Normally, this happens when a node drops off the cluster for whatever reason (hardware failure, long garbage collection time, etc.). Once the node recovers, its shards will remain in an initializing state before they transition back to active status.

The number of initializing shards typically peaks when a node rejoins the cluster, and then drops back down as the shards transition into an active state, as shown in the graph below.

initializing-shards.png

During this initialization period, your cluster state may transition from green to yellow or red until the shards on the recovering node regain active status. In many cases, a brief status change to yellow or red may not require any action on your part.

node-status.png

However, if you notice that your cluster status is lingering in red or yellow state for an extended period of time, verify that the cluster is recognizing the correct number of Elasticsearch nodes, either by consulting Datadog’s dashboard or by querying the Cluster Health API detailed in Part 2.

es-num-of-nodes.png

If the number of active nodes is lower than expected, it means that at least one of your nodes lost its connection and hasn’t been able to rejoin the cluster. To find out which node(s) left the cluster, check the logs (located by default in the logs folder of your Elasticsearch home directory) for a line similar to the following:

[TIMESTAMP] ... Cluster health status changed from [GREEN] to RED

Reasons for node failure can vary, ranging from hardware or hypervisor failures, to out-of-memory errors. Check any of the monitoring tools outlined here for unusual changes in performance metrics that may have occurred around the same time the node failed, such as a sudden spike in the current rate of search or indexing requests. Once you have an idea of what may have happened, if it is a temporary failure, you can try to get the disconnected node(s) to recover and rejoin the cluster. If it is a permanent failure, and you are not able to recover the node, you can add new nodes and let Elasticsearch take care of recovering from any available replica shards; replica shards can be promoted to primary shards and redistributed on the new nodes you just added.

However, if you lost both the primary and replica copy of a shard, you can try to recover as much of the missing data as possible by using Elasticsearch’s snapshot and restore module. If you’re not already familiar with this module, it can be used to store snapshots of indices over time in a remote repository for backup purposes.

Problem #2: Help! Data nodes are running out of disk space

If all of your data nodes are running low on disk space, you will need to add more data nodes to your cluster. You will also need to make sure that your indices have enough primary shards to be able to balance their data across all those nodes.

However, if only certain nodes are running out of disk space, this is usually a sign that you initialized an index with too few shards. If an index is composed of a few very large shards, it’s hard for Elasticsearch to distribute these shards across nodes in a balanced manner.

This is the most thorough look at Elasticsearch internals that I’ve seen (although admittedly that’s not something I’m usually on the lookout for).

Comments closed

Using Pester To Validate Script Installations

Rob Sewell wants to use Pester to guarantee that he has Ola’s maintenance scripts installed on a server:

First I thought about what I would look for in SSMS when I had installed the maintenance solution and made a list of the things that I would check which looked something like this. This would be the checklist you would create (or have already created) for yourself or a junior following this install. This is how easy you can turn that checklist into a Pester Test and remove the human element and open your install for automated testing
  • SQL Server Agent is running – Otherwise the jobs won’t run🙂

  • We should have 4 backup jobs with a name of

  • DatabaseBackup – SYSTEM_DATABASES – FULL

  • DatabaseBackup – USER_DATABASES – FULL

  • DatabaseBackup – USER_DATABASES – DIFF

  • DatabaseBackup – USER_DATABASES – LOG

  • We should have Integrity Check and Index Optimisation Jobs

  • We should have the clean up jobs

  • All jobs should be scheduled

  • All jobs should be enabled

  • The jobs should have succeeded

There’s a very nice script and walkthrough of the process if you click through.

Comments closed

SSISDB Management

Andy Leondard describes steps you can take to maintain the SSIS Catalog:

Back It Up

As with all SQL Server database, please back up SSISDB. What follows is a (very) basic guide describing one simple method to backup your SSISDB database. Please, please, please learn more about SQL Server backup and restore options and their implications before backing up an SSISDB database in your enterprise. Feel free to use the steps I describe on your laptop or a virtual machine. And please remember…

Backups are useless. Restores are priceless. Conduct practice Disaster Recovery exercises in which you restore databases and then test functionality. You’ll be glad you did. Here is a link containing Microsoft’s advice on restoring the SSISDB database in SQL Server 2016.

The advice is pretty similar to what you’d expect for any other database, but there are a couple twists around SSISDB functionality, so do read on.

Comments closed

Automating Deadlock Execution Plan Collection

Michael J. Swart comes up with a system to collect execution plans at the time of a deadlock and log them to a table for further research:

So How Do I Get To The Execution Plans?
So when I look at a deadlock graph, I can see there are sql_handles. Given that, I can grab the plan_handle and then the query plan from the cache, but I’m going to need to collect it automatically at the time the deadlock is generated. So I’m going to need

  • XML shredding skills

  • Ability to navigate DMVs to get at the cached query plans

  • A way to programatically respond to deadlock graph events (like an XE handler or a trigger)

If you don’t have the funding to get a third-party tool in place which collects this information, this could be a good fit.

Comments closed

TCP Chimney Offloading

Wayne Sheffield offers advice on TCP Chimney Offloading:

With all of these changes to the OS, which setting should we use for SQL Server? In general, for all of these operating systems, I recommend that TCP Chimney Offload be disabled – because you can see odd connectivity problems in any other state. Notice in the above quote that Microsoft says that this feature is best used for applications with long-lived connections that transfer large amounts of data – hopefully your OLTP database is performing lots of short-lived connections and they are not transferring large amounts of data (if they are, I can help you with that!).

Definitely worth a read.

Comments closed

Changing Identity Start Value

Kenneth Fisher has a good post on what happens when you change the seed value of an identity column:

Well Paul told me this wasn’t the case. Now when Paul tells me something I believe him, but I also like to run tests. So I decided to usesys.fn_PhysLocCracker(%%physloc%%). %%physloc%% returns a varbinary that gives you the location of the row. When passed tosys.fn_PhysLocCracker(%%physloc%%) it returns the database file, page in the file, and slot number where the row can be found. So to start with I create an identity(1,1) and I run 20 inserts, one at a time, checking row locations each time. This is to confirm I’m right about this part.

Clicking through is worth it for the hypnotizing animated GIFs.

Comments closed

Why Force Query Store Plans

Grant Fritchey explains the wherefore behind query store plan forcing:

But, what else does Force Plan do for you? What if you never experience bad parameter sniffing (you do, but I’m not going to argue the point)? Is there something else that Force Plan can do for you? Heck yes! The whole point of creating the Query Store was in order to address Plan Regression. What the heck is plan regression? When Microsoft makes any change to the Query Optimizer, and those changes come all the time, it’s possible that you might see a change in your execution plans. Most of the time, it’s going to be a positive change. That’s why they’re changing the Optimizer after all, to improve it. However, sometimes, you’re benefiting from the old behavior of the Optimizer and that new plan doesn’t work as well as the old plan. This is plan regression. When Microsoft changed the Cardinality Estimation engine in SQL Server 2014, more than a few people experienced the new estimator giving row estimates that resulted in a different execution plan that didn’t perform as well as the old plan. This is plan regression. What to do?

This is a good read.

Comments closed