Monitoring Big Data Clusters

Mohammad Darab continues a series on Big Data Clusters:

There are many ways to view the health of your Big Data Cluster. As of CTP 3.0, there are kubectl commands, mssqlctl commands as well as dashboards. For the sake of this series, I will focus on the dashboards. I will blog about some of the useful kubectl and mssqlctl commands in later posts.

The first dashboard is the Microsoft Cluster Administration portal (see below snapshot). This is a view into the Big Data Cluster Controller. As you can see from the image below, the Overview pane shows the Controller, Master Instance and all the pools. On the left hand side you can see more details. If you click on the “Service Endpoint” option, you will see a list of endpoints that you can bookmark.

Something I appreciate is that Microsoft thought ahead on what the monitoring story should look like rather than waiting until the end and slapping something together.

Breaking Down the MAXDOP Guidance Change

Joe Obbish digs into Microsoft’s new guidance for maximum degree of parallelism:

I’ve heard some folks claim that keeping all parallel workers on a single hard NUMA nodes can be important for query performance. I’ve even seen some queries experience reduced performance when thread 0 is on a different hard NUMA node than parallel worker threads. I haven’t heard of anything about the importance of keeping all of a query’s worker threads on a single soft-NUMA node. It doesn’t really make sense to say that query performance will be improved if all worker threads are on the same soft-NUMA node. Soft-NUMA is a configuration setting. Suppose I have a 24 core hard NUMA node and my goal is to get all of a parallel query’s worker threads on a single soft-NUMA node. To accomplish that goal the best strategy is to disable auto soft-NUMA because that will give me a NUMA node size of 24 as opposed to 8. So disabling auto soft-NUMA will increase query performance?

Joe takes a careful look at the documentation and brings up some really good questions.

Offloading Code Review Burdens with Automation

Ed Elliott argues that automation and testing can make code reviews easier:

OK so if we break this down into what a DBA should be doing as part of a code review:

– Ensure formatting is correct and any standards followed
– Have they introduces a SQL injection vulnerability?
– Consider any side effects of the actual change, for instance altering a clustered key on a 1 billion row table will take time – is this possible on a live system?
– Consider any performance effects – is this more prone to tempdb spills? How about deadlocks? Is the plan going to be terrible?
– Is the code going to do what the developer wants? Do they have the update statement correct in the merge statement?

That’s a lot, how can we help developers understand enough so that they can review their own code and cause fewer issues in production?

I believe this is a bit aspirational. Nevertheless, if you do get there, life gets easier.

Tracking xp_cmdshell Executions

Jason Brimhall shows how you can see when someone calls xp_cmdshell, including the call details:

What was the wait_type? Well, the obscure wait_type was called PREEMPTIVE_OS_PIPEOPS. What causes this wait? As it turns out, this is a generic wait that is caused by SQL pipe related activities such as xp_cmdshell.

Knowing this much information however does not get us to the root cause of this particular problem for this client. How do we get there? This is another case for Extended Events (XEvents).

Read on for two ways to approach this, both using Extended Events.

DBAs in the Cloud

Brent Ozar argues that production DBAs will still be important even at cloud-only companies:

One of my favorite recent examples was a company who came to me saying, “We’re spending about $2M per year in the cloud just on our databases alone. Can you help us reduce those costs?” Absolutely: with just a couple of days spent query & index tuning, we chopped their biggest database expenses in half while increasing performance.

At the end of that engagement, the CTO told me, “I thought I’d save money in the cloud by not having a DBA, but what I’m learning is that in the cloud, I actually get a return on my DBA investments.

I completely agree with this post. The exact tools DBAs use will change, but the role will still be around decades from now. And that’s at the companies which move quickly.

Comparing On-Prem To Managed Instance Performance

Jovan Popovic has an article explaining how you can compare your current on-premises SQL Server’s performance to an Azure SQL Managed Instance’s performance:

In this post you will see some recommended tools and best practices that you should apply while doing performance comparison. The recommended performance comparison process has three stages:

1. Compare the environment settings on SQL Server and Managed Instance. 
2. Create performance baseline on source SQL Server
3. Compare performance on Managed Instance with the baseline

In the following sections will be described the best practices and the recommended approaches 

This is a good bit more involved than installing some product, clicking a few buttons, and comparing numbers.

Kafka In Front of ELK

Daniel Berman sets up a simple Elasticsearch-Logstash-Kibana (ELK) stack and throws Kafka in front of it:

To perform the steps below, I set up a single Ubuntu 16.04 machine on AWS EC2 using local storage. In real-life scenarios you will probably have all these components running on separate machines.

I started the instance in the public subnet of a VPC and then set up a security group to enable access from anywhere using SSH and TCP 5601 (for Kibana). Finally, I added a new elastic IP address and associated it with the running instance.

The example logs used for the tutorial are Apache access logs.

This is a great walkthrough on setup and basic configuration. If you don’t have something in place to manage logs, the ELK stack is fine.

Finding Recently Created Objects

Max Vernon has a script to help us find what new objects now exist on your database:

The code below provides a list of all SQL Server objects created in the past “x” number of days. Dynamic T-SQL is used to construct a query for each database, including system databases. Each query provides the schema, name, and date created for each object listed, along with the object type description.

This looks quite useful for auditing. You might want to filter out tempdb on a real system, though.

Tracking Database Changes with DDL Triggers

Lori Brown shows how you can use DDL triggers to track database or instance-level changes:

I have been working on some improvements to some of the regular ways we monitor for important changes.  We always have to be on the lookout for unexpected changes being made in the SQL instances that we monitor since often times we are not the only team who has sysadmin access to the instance.  We are always the best trained to take care of and configure things but we sometimes find that someone makes a change either to the SQL or database configuration without telling us.  We want to know when things like this happen!

I’m a big fan of these. Of course you need to get the code right, as a bad trigger can be devastating but you can get a lot of useful information out of it and figure out who’s hand was in the cookie jar.

Persistent Memory and SQL Server

Ned Otter gives us the rundown on Persistent Memory and how it can make life smoother:

SQL 2017 on Windows Server 2016 behaves the same as SQL 2016 on Windows Server 2016 – “tail of the log” is supported. However, there is no support for PMEM with SQL 2017 on supported Linux distributions (except as a traditional block store). Using PMEM with SQL 2019 on Linux supports what’s known as “enlightenment”, which allows us to place data and log files on DAX formatted volumes, thereby reducing latency considerably. SQL 2019 on Linux also support “tail of the log”.

This is one of those areas where understanding Linux versus Windows administration really pays off, at least until Windows Server supports something like enlightenment.

Categories

July 2019
MTWTFSS
« Jun  
1234567
891011121314
15161718192021
22232425262728
293031