New Diagnostics For Synchronous Statistics Updates

Joe Sack announces a new wait type and request command:

Consider the following query execution scenario:

  • You execute a SELECT query that triggers an automatic synchronous statistics update.
  • The synchronous statistics update begins execution and your query waits (is essentially blocked) until the fresh statistics are generated.
  • The query compilation and execution does not resume until the synchronous statistics update operation completes.

During this time, there are no external signs via common troubleshooting channels that the query is specifically waiting for the synchronous statistics update operation to complete.  If the statistics update takes a long time (due to a large table and\or busy system), there is no easy way to determine root cause of the high duration.

This is now available in SQL Server 2019 CTP 2.1.  Read the whole thing.

Deep Dive On Log Buffer Flushes

Itzik Ben-Gan delves into log buffer flushes and how SQL Server maintains durability without giving up too much performance:

The way SQL Server enforces transaction durability, in part, is by ensuring that all of the transaction’s changes are written to the database’s transaction log on disk before returning control to the caller. In a case of a power failure after a transaction’s commit was acknowledged, you know that all those changes were at least written to the on-disk transaction log. That’s the case even if the related data pages were modified only in the data cache (the buffer pool) but not yet flushed to the data files on disk. When you restart SQL Server, during the redo phase of the recovery process, SQL Server uses the information recorded in the log to replay changes that were applied after the last checkpoint and that haven’t made it to the data files. There’s a bit more to the story depending on the recovery model that you’re using and on whether bulk operations were applied after the last checkpoint, but for the purposes of our discussion, suffice to focus on the part that involves hardening the changes to the transaction log.

The tricky part in SQL Server’s logging architecture is that log writes are sequential. Had SQL Server not used some sort of a log buffer to alleviate log writes to disk, write-intensive systems—especially ones that involve lots of small transactions—would quickly run into terrible log-write-related performance bottlenecks.

To alleviate the negative performance impact of frequent sequential log writes to disk, SQL Server uses a log buffer in memory.

Itzik also covers a few technologies which can help if you’re experiencing log buffer flush-related slowness, including enabling delayed durability and purchasing nonvolatile memory (NVDIMM-N) for storage.

Breaking Changes Coming To dbatools

Chrissy LeMaire warns us about breaking changes coming to dbatools with release 1.0:

Sometime in the next month, I’ll also be updating Start-DbaMigration to more closely match the parameters of Export-DbaInstance. Parameters like NoDatabases and NoLogins will be replaced by -Exclude Databases, Logins.

So the functionality won’t necessarily change, but if you have scheduled tasks or scripts that perform migrations, you will need to update your parameters once you update dbatools once these changes are made.

Keep an eye out for all of these changes if you’re a regular dbatools user or have processes scripted.

Storage Forecasting In SentryOne

Steven Wright shows off a neat feature in the SentryOne product:

With that in mind, SentryOne has been working toward integrating advanced predictive analytics into our products. SentryOne 18.5 includes the first feature set to make use of advanced analytics and machine learning technology—introducing advanced Storage Forecasting.

In its most advanced configuration, SentryOne now applies machine learning algorithms to produce daily usage forecasts for all logical disks on your Windows servers. A forecast is customized for each volume, and each day the system will analyze its previous forecasts to learn how to adjust with each new run. This is done by generating forecasts using multiple algorithms and combining the results into an ensemble forecast. Each day, the system will review how each component forecast performed and weigh it proportionally within the ensemble. Over time, the system learns the unique workload of each volume and fine-tunes its forecasts accordingly.

There are several places within the SentryOne client where you can review and act on this information to prevent outages or perform capacity planning. Let’s start with the Disk Space tab.

With my only information being the blog post (and especially the pictures), I’d guess that they’re probably using some variant of ARIMA to calculate disk utilization.

Creating A SQL Server 2019 Big Data Cluster On Azure

Niels Berglund walks us through the setup for SQL Server 2019 Big Data Clusters:

If you, like me, are a SQL Server guy, you are probably quite familiar with installing SQL Server instances by mounting an ISO file, and running setup. Well, you can forget all that when you deploy a SQL Server 2019 Big Data Cluster. The setup is all done via Python utilities, and various Docker images pulled from a private repository. So, you need Python3. On my box I have Python 3.5, and – according to Microsoft – version 3.7 also works. Make you that you have your Python installation on the path.

When you deploy you use a Python utility: mssqlctl. To download mssqlctl, you need Python’s package management system pip installed. During installation you also need a Python HTTP library: Requests. If you do not have it you need to install it:

python -m pip install requests

This isn’t available to the general public quite yet, but when it is publicly available (or if you are part of the Early Access Program), the instructions are nice and clear.

Installing Kubernetes On-Prem

Anthony Nocentino shows us how to install Kubernetes on-prem:

Kubernetes is a distributed system, you will be creating a cluster which will have a master node that is in charge of all operations in your cluster. In this walkthrough we’ll create three workers which will run our applications. This cluster topology is, by no means, production ready. If you’re looking for production cluster builds check out Kubernetes documentation. Here and here. The primary components that need high availability in a Kubernetes cluster are the API Server which controls the state of the cluster and the etcddatabase which stores the persistent state of the cluster. You can learn more about Kubernetes cluster components here.

In our demonstration here, the master is where the API Server, etcd, and the other control plan functions will live. The workers, will be joined to the cluster and run our application workloads.

This is an area I need to focus on, given my almost total lack of knowledge in the world of container orchestration.

Accelerated Database Recovery In SQL Server 2019

Frank Gill notes an exciting new feature in SQL Server 2019:

“Any sufficiently advanced technology is indistinguishable from magic.” -Arthur C. Clarke

In this morning’s keynote session at PASS Summit 2018, public preview of a new feature in Azure SQL Database and SQL Server 2019 called Accelerated Database Recovery (ADR) was announced.  This changes the way that SQL Server handles recovery of a SQL Server instance on start up.

This looks really good for large databases, where recovery can sometimes be measured in hours.

SQL Server IaaS Versus PaaS On AWS

John McCormack identifies some differences between running SQL Server in EC2 versus RDS on Amazon Web Services:

How do I run SQL Server on AWS?

Running SQL Server on AWS can be done in 2 ways.

  • Relation Database Service (RDS): AWS’s managed solution where some of the administration (maintenance, backups and patching) is handled for you.

  • EC2: Your very own virtual machine in the cloud. With EC2, you manage SQL Server, just like you would do on-premises. This gives you full control over your SQL instance.

Click through for the comparison.

Using Datadog To Monitor Spark Clusters On EMR

Priya Matpadi walks us through one way to monitor Spark clusters on Amazon ElasticMapReduce:

We recently implemented a Spark streaming application, which consumes data from from multiple Kafka topics. The data consumed from Kafka comprises different types of telemetry events generated by mobile devices. We decided to host the Spark cluster using the Amazon EMR service, which manages a fleet of EC2 instances to run our data-processing pipelines.

As part of preparing the cluster and application for deployment to production, we needed to implement monitoring so we could track the streaming application and the Spark infrastructure itself. At a high level, we wanted ensure that we could monitor the different components of the application, understand performance parameters, and get alerted when things go wrong.

In this post, we’ll walk through how we aggregated relevant metrics in Datadog from our Spark streaming application running on a YARN cluster in EMR.

Check it out.  If this is interesting, Priya’s blog has the full series.

Finding Who Changed Auto-Tuning Settings On Azure SQL DB

Arun Sirpal shows us the extended event to watch to learn who changed that auto-tuning setting:

It is said to be safe, reliable and proven using complex algorithms and built-in intelligence where it can do the following (see this link for more details: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automatic-tuning)

  1. CREATE INDEX – identifies indexes that may improve performance of your workload, creates indexes, and automatically verifies that performance of queries has improved.
  2. DROP INDEX – identifies redundant and duplicate indexes daily, except for unique indexes, and indexes that were not used for a long time (>90 days). Please note that at this time the option is not compatible with applications using partition switching and index hints.
  3. FORCE LAST GOOD PLAN – identifies SQL queries using execution plan that is slower than the previous good plan, and queries using the last known good plan instead of the regressed plan.

Personally I don’t  enable the option where it is allowed a “free-for-all” when creating/dropping indexes and forcing certain query plans. I like controlling the change, especially for production databases. To force this concept I wanted to use Extended Events to know when / if someone changed my settings for automatic tuning against my database.

Click through for the script.

Categories

January 2019
MTWTFSS
« Dec  
 123456
78910111213
14151617181920
21222324252627
28293031