Press "Enter" to skip to content

Author: Kevin Feasel

Shred That XML

Steve Jones has an intro-level post on shredding an extended event to get to the relevant portion:

I was playing with some Extended Events recently. If you haven’t tried, I’d encourage you to do so. However, working with XML is not my favorite. I know I can get the GUI in SSMS 16.x to show me events, but I sometimes want to query.

Here was my quick adventure in XML and XQUERY. I should know this stuff better, but I think I’m working with XML so rarely that I’m constantly re-learning things.

Read on for the code.

Comments closed

Finding All Sysadmins

Chris Bell has a Powershell script to find all sysadmins on a SQL Server instance:

The script below identifies the accounts on your SQL Server that have full sysadmin rights, either on their own or via an Active Directory Group.

To run this, you need a few things setup first.

  1. A file named Instances.txt that has each instance you are going to check on its own line. Just the name, nothing more. You can see the reference to the location at the beginning of the script, just change it to wherever you put your file.

  2. Rights to read the AD information for the domain. This way we can get the members of any groups granted access to your SQL environment.

Click through for the script.

Comments closed

Biml Project Level Connection Issue

Bill Fellows explains a workaround he uses to set the project-level connection GUID in Biml:

There is no attribute in the Connections collection to assign a guid. It’s simply not there. If you want to associate an Id with an instance of a Connection your choices are the Project node and the Package node. Since we’re dealing with project level connection managers, we best cover both bases to ensure Ids synchronize across our project. If you wish, you could have embedded this Projects node in with the Connections but then you’d have to statically set these Ids. I feel like showing off so we’ll go dynamic.

To start, I define a list of static GUID values in the beginning of my file. Realistically, we have these values in a table and we didn’t go with “known” values. The important thing is that we will always map a guid to a named connection manager. If you change a connection manager’s definition from being project level to non, or vice versa, this will result in the IDs shifting and you’ll see the same symptoms as above.

There’s plenty of code over on Bill’s site to help you as well.

Comments closed

Blocking Notifications

Kendra Little shows how to set up blocking and deadlock notifications using base SQL Server components:

OK, we’ve got notifications. We need SQL Server to give us more information on who is involved in the blocking.

I like to use the built-in Blocked Process Report for this. This has been in SQL Server for a long time, and it’s extremely useful.

The Blocked Process Report shows you the “input buffer” of the commands involved – it may be partial information and not the full text of the query. It will also show you the login name for who is running what, and the type of lock requests involved.

You don’t have to spend extra money to get good diagnostic information, at least about these items.

Comments closed

Cortana Intelligence Solutions

James Serra gives an introductory walkthrough to Cortana Intelligence Solutions:

Cortana Intelligence Solutions is a new tool just released in public preview that enables users to rapidly discover, easily provision, quickly experiment with, and jumpstart production grade analytical solutions using the Cortana Intelligence Suite (CIS).  It does so using preconfigured solutions, reference architectures and design patterns (I’ll just call all these solutions “patterns” for short).  At the heart of each Cortana Intelligence Solution pattern is one or more ARM Templates which describe the Azure resources to be provisioned in the user’s Azure subscription.  Cortana Intelligence Solution patterns can be complex with multiple ARM templates, interspersed with custom tasks (Web Jobs) and/or manual steps (such as Power BI authorization in Stream Analytics job outputs).

So instead of having to manually go to the Azure web portal and provision many sources, these patterns will do it for you automatically.  Think of a pattern as a way to accelerate the process of building an end-to-end demo on top of CIS.  A deployed solution will provision your subscription with necessary CIS components (i.e. Event Hub, Stream Analytics, HDInsight, Data Factory, Machine Learning, etc.) and build the relationships between them.

James also walks through an entire solution, so check it out.

Comments closed

Division By Zero

Erik Darling warns us against tearing the fabric of time by dividing by zero:

There are several options for fixing this. For instance, you can use a CASE expression, or COALESCE, but I find they get a tad muddled to write after a while, especially if we’re safeguarding multiple columns from our mathematical disaster. Plus,under the covers, the functions I like to use are just case expressions anyway. Isn’t it nice that SQL Server will save you a touch of typing? I think so. What a considerate piece of software!

This is a bit of a beginner tip, but it came up while we were at DBA Days, so I figured I’d write about it in case anyone runs across it.

Read on for Erik’s favorite solution to the problem.

Comments closed

Errors With Linked Server AG Connections Over ODBC

Andrea Allred has set up a linked server connection (using ODBC) to talk to an Availability Group, but certain columns fail:

Some of the problems that we have noticed are querying tables that have big datatypes like time(3-7), timestamp, and a few others.  Casting or converting the datatypes doesn’t help. If we pull the table into a view without the big datatype columns, we are able to query the view from another server, but never the base table. It has been a bit frustrating, but we are still hopeful that we can find a solution or that Microsoft with fix ODBC connections. If there is a better way to do this, please reach out to me.  We have things we need to solve and could use some help.

This is a nice walkthrough of how she set it up, but that sounds like a rather frustrating error.

Comments closed

Monitoring Elasticsearch Performance

Emily Chang has a big, four-part series on monitoring Elasticsearch performance.  Part 1 is a nice introduction to Elasticsearch and important metrics out of the box:

The three most common types of nodes in Elasticsearch are:

  • Master-eligible nodes: By default, every node is master-eligible unless otherwise specified. Each cluster automatically elects a master node from all of the master-eligible nodes. In the event that the current master node experiences a failure (such as a power outage, hardware failure, or an out-of-memory error), master-eligible nodes elect a new master. The master node is responsible for coordinating cluster tasks like distributing shards across nodes, and creating and deleting indices. Any master-eligible node is also able to function as a data node. However, in larger clusters, users may launch dedicated master-eligible nodes that do not store any data (by adding node.data: false to the config file), in order to improve reliability. In high-usage environments, moving the master role away from data nodes helps ensure that there will always be enough resources allocated to tasks that only master-eligible nodes can handle.

  • Data nodes: By default, every node is a data node that stores data in the form of shards (more about that in the section below) and performs actions related to indexing, searching, and aggregating data. In larger clusters, you may choose to create dedicated data nodes by addingnode.master: false to the config file, ensuring that these nodes have enough resources to handle data-related requests without the additional workload of cluster-related administrative tasks.

  • Client nodes: If you set node.master and node.data to false, you will end up with a client node, which is designed to act as a load balancer that helps route indexing and search requests. Client nodes help shoulder some of the search workload so that data and master-eligible nodes can focus on their core tasks. Depending on your use case, client nodes may not be necessary because data nodes are able to handle request routing on their own. However, adding client nodes to your cluster makes sense if your search/index workload is heavy enough to benefit from having dedicated client nodes to help route requests.

Part 2 shows how to collect metrics using various APIs:

The Node Stats API is a powerful tool that provides access to nearly every metric from Part 1, with the exception of overall cluster health and pending tasks, which are only available via the Cluster Health API and the Pending Tasks API, respectively. The command to query the Node Stats API is:

curl localhost:9200/_nodes/stats

The output includes very detailed information about every node running in your cluster. You can also query a specific node by specifying the ID, address, name, or attribute of the node. In the command below, we are querying two nodes by their names, node1 and node2 (node.name in each node’s configuration file):

curl localhost:9200/_nodes/node1,node2/stats

Each node’s metrics are divided into several sections, listed here along with the metrics they contain from Part 1.

Part 3 is a brief for using Datadog for metrics collection and display:

The Datadog Agent is open source software that collects and reports metrics from each of your nodes, so you can view and monitor them in one place. Installing the Agent usually only takes a single command. View installation instructions for various platforms here. You can also install the Agent automatically with configuration management tools like Chef orPuppet.

Part 4 walks through some common Elasticsearch performance issues:

How to solve 5 Elasticsearch performance and scaling problemsseries /

This post is the final part of a 4-part series on monitoring Elasticsearch performance. Part 1 provides an overview of Elasticsearch and its key performance metrics, Part 2 explains how to collect these metrics, and Part 3 describes how to monitor Elasticsearch with Datadog.

Like a car, Elasticsearch was designed to allow its users to get up and running quickly, without having to understand all of its inner workings. However, it’s only a matter of time before you run into engine trouble here or there. This article will walk through five common Elasticsearch challenges, and how to deal with them.

Problem #1: My cluster status is red or yellow. What should I do?

es-cluster-status.png

If you recall from Part 1, cluster status is reported as red if one or more primary shards (and its replicas) is missing, and yellow if one or more replica shards is missing. Normally, this happens when a node drops off the cluster for whatever reason (hardware failure, long garbage collection time, etc.). Once the node recovers, its shards will remain in an initializing state before they transition back to active status.

The number of initializing shards typically peaks when a node rejoins the cluster, and then drops back down as the shards transition into an active state, as shown in the graph below.

initializing-shards.png

During this initialization period, your cluster state may transition from green to yellow or red until the shards on the recovering node regain active status. In many cases, a brief status change to yellow or red may not require any action on your part.

node-status.png

However, if you notice that your cluster status is lingering in red or yellow state for an extended period of time, verify that the cluster is recognizing the correct number of Elasticsearch nodes, either by consulting Datadog’s dashboard or by querying the Cluster Health API detailed in Part 2.

es-num-of-nodes.png

If the number of active nodes is lower than expected, it means that at least one of your nodes lost its connection and hasn’t been able to rejoin the cluster. To find out which node(s) left the cluster, check the logs (located by default in the logs folder of your Elasticsearch home directory) for a line similar to the following:

[TIMESTAMP] ... Cluster health status changed from [GREEN] to RED

Reasons for node failure can vary, ranging from hardware or hypervisor failures, to out-of-memory errors. Check any of the monitoring tools outlined here for unusual changes in performance metrics that may have occurred around the same time the node failed, such as a sudden spike in the current rate of search or indexing requests. Once you have an idea of what may have happened, if it is a temporary failure, you can try to get the disconnected node(s) to recover and rejoin the cluster. If it is a permanent failure, and you are not able to recover the node, you can add new nodes and let Elasticsearch take care of recovering from any available replica shards; replica shards can be promoted to primary shards and redistributed on the new nodes you just added.

However, if you lost both the primary and replica copy of a shard, you can try to recover as much of the missing data as possible by using Elasticsearch’s snapshot and restore module. If you’re not already familiar with this module, it can be used to store snapshots of indices over time in a remote repository for backup purposes.

Problem #2: Help! Data nodes are running out of disk space

If all of your data nodes are running low on disk space, you will need to add more data nodes to your cluster. You will also need to make sure that your indices have enough primary shards to be able to balance their data across all those nodes.

However, if only certain nodes are running out of disk space, this is usually a sign that you initialized an index with too few shards. If an index is composed of a few very large shards, it’s hard for Elasticsearch to distribute these shards across nodes in a balanced manner.

This is the most thorough look at Elasticsearch internals that I’ve seen (although admittedly that’s not something I’m usually on the lookout for).

Comments closed

Analyzing The Simpsons

Todd Schneider has a fun analysis of the Simpsons:

Per Wikipedia:

While later seasons would focus on Homer, Bart was the lead character in most of the first three seasons

I’ve heard this argument before, that the show was originally about Bart before switching its focus to Homer, but the actual scripts only seem to partially support it.

Bart accounted for a significantly larger share of the show’s dialogue in season 1 than in any future season, but Homer’s share has always been higher than Bart’s. Dialogue share might not tell the whole story about a character’s prominence, but the fact is that Homer has always been the most talkative character on the show.

My reading is that it took a couple seasons for show writers to realize that Homer is the funniest character and that Bart’s character was too context-sensitive to be consistently funny.  It took quite a bit more time before merchandisers figured that out, to the extent that they ever did.

Comments closed

Using Pester To Validate Script Installations

Rob Sewell wants to use Pester to guarantee that he has Ola’s maintenance scripts installed on a server:

First I thought about what I would look for in SSMS when I had installed the maintenance solution and made a list of the things that I would check which looked something like this. This would be the checklist you would create (or have already created) for yourself or a junior following this install. This is how easy you can turn that checklist into a Pester Test and remove the human element and open your install for automated testing
  • SQL Server Agent is running – Otherwise the jobs won’t run🙂

  • We should have 4 backup jobs with a name of

  • DatabaseBackup – SYSTEM_DATABASES – FULL

  • DatabaseBackup – USER_DATABASES – FULL

  • DatabaseBackup – USER_DATABASES – DIFF

  • DatabaseBackup – USER_DATABASES – LOG

  • We should have Integrity Check and Index Optimisation Jobs

  • We should have the clean up jobs

  • All jobs should be scheduled

  • All jobs should be enabled

  • The jobs should have succeeded

There’s a very nice script and walkthrough of the process if you click through.

Comments closed