Press "Enter" to skip to content

Month: November 2020

Understanding Skewness and Kurtosis

George Pipis explains two key concepts of a distribution:

Most commonly a distribution is described by its mean and variance which are the first and second moments respectively. Another less common measures are the skewness (third moment) and the kurtosis (fourth moment). Today, we will try to give a brief explanation of these measures and we will show how we can calculate them in R.

Click through for an explanation. H/T R-Bloggers

Comments closed

An Intro to Backup Strategy with SQL Server

Pamela Mooney takes us through backup and restoration as part of a DBA in training series:

A DBA’s job is getting the right data to the right people as quickly as possible.

Consider that first part – getting the right data. How do you do that? By having the data in the first place. The best way to ensure that you have the data is to look at your company’s needs, recommend availability options, and most importantly, do backups. Few things will get a DBA fired more quickly than not having backups available when they are needed. It may be years before you ever have to restore a database from a backup, but the backups had better be there when you need to do one.

Conference sessions and books have been written on the subjects of backup/restores, SLAs, and availability options, and they are so good that I’ll just give you a brief overview here. You’ll gravitate to your need-to-learns soon enough.

This is a nice overview for a beginner. One semi-related piece of advice that I picked up from Sean McCown: know that backup and restore syntax cold. Take backups manually (even if they’re just on a test server intended for this purpose) every day until you feel comfortable typing out backup and restore syntax from memory. There will come a time when three levels of management are standing over your shoulder waiting for a database to restore and SSMS’s GUI is crawling. At that point, knowing the syntax cold will be completely worthwhile.

Comments closed

Policy-Based Management for Availability Groups

Rajendra Gupta takes us through the policies in Policy-Based Management which relate to Availability Groups:

SQL Server Always On uses the Policy-Based Management(PBM) for determining its health. In the earlier articles, we discovered AG dashboard features to monitor synchronization status, data loss, replica states. It executes the PBM policies on availability replicas (primary and secondary), availability group database and organizes the results in a dashboard.

The primary replica contains information for all replicas, their synchronization states. It has sufficient information to compute the health for all availability groups. If we launch the AG dashboard from the primary and secondary replica, we can note the difference in monitoring.

This is perhaps the most under-developed and under-utilized feature in SQL Server. It is conceptually so powerful and what’s in there shows some of that power, but to this day, so much is still lacking in PBM.

Comments closed

Alternatives to an Agentless Azure SQL DB

Reitse Eskens gives us a few alternatives to use when we need something like SQL Agent but are running in Azure SQL Database:

What i got into was the following. For a project we’re loading an Azure sql database (serverless) with a lot of data (think billions of rows) that has to come from an on-premises Oracle server. We’re using a vpn connection with network peering to connect to the on-premises server and using a VM with a third-party tool to load the data.

Normally we’re delta-loading the database but because it’s a new project we need to perform an initial load. Nothing really weird, just a huge number of records that needs to pass through. And every now and then the application freezes and refuses to thaw. Because it’s hard to find out when the freezing will start, we want to monitor some processes on the database.

Now on a normal SQL Server i’d create a job in the Agent and be done with that part. But not on Azure. Because the agent doesn’t exist there. In SSMS you’ll see a huge empty space where the agent ought to be.

Reitse lists five separate options. A sixth would be to spin up SQL Server in a VM and use its agent for scheduling. And there are a few more alternatives as well in the ‘outside scheduler’ realm.

Comments closed

Accelerated Database Recovery in SQL Server 2019

Jamie Wick walks us through the key concepts with Accelerated Database Recovery:

Beginning with SQL Server 2019, Microsoft has redesigned the database recovery process (ie. crash recovery and rollback) to improve availability and performance. This new feature is called Accelerated Database Recovery (ADR).

Prior to SQL Server 2019, recovering a database (after a crash or restart) consisted of 3 phases that followed the ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) recovery model. The phases of this model are Analysis, Redo & Undo. The Analysis phase begins with the last successful database checkpoint and forward-scans the transaction log to determine the state of each transaction. The Redo phase begins with the oldest uncommitted transaction (that was active after the checkpoint) and rolls-forward, bringing the database state to the same point it was at immediately prior to the crash. The Undo phase then goes backward from the end of the transaction log to reverse all transactions that were active (uncommitted) at the time of the crash. With this process, the database recovery time is roughly the same as the longest running query that was active at the time of the crash.

Click through for the full story.

Comments closed

The Unbearable Slowness of Full Text Queries

Brent Ozar explains why full-text search in SQL Server can be so slow:

SQL Server’s full text search is amazing. Well, it amazes me at least – it has so many cool capabilities: looking for prefixes, words near each other, different verb tenses, and even thesaurus searches. However, that’s not how I see most people using it: I’ve seen so many shops using it for matching specific strings, thinking it’s going to be faster than LIKE ‘%mysearch%’. That works at small scale, but as your data grows, you run into a query plan performance problem.

When your query uses CONTAINS, SQL Server has a nasty habit of doing a full text search across all of the rows in the table rather than using the rest of your WHERE clause to reduce the result set first.

Read on for the full impact as well as some alternatives. I agree that those alternatives come with costs (whether that be monetary or conceptual), but I’ve used both n-grams and Elasticsearch with some success.

Comments closed

Auto-Checking Azure Data Factory Setup

Paul Andrew is at it again:

Building on the work done and detailed in my previous blog post (Best Practices for Implementing Azure Data Factory) I was tasked by my delightful boss to turn this content into a simple check list of what/why that others could use…. I slightly reluctantly did so. However, I wanted to do something better than simply transcribe the previous blog post into a check list. I therefore decided to breakout the Shell of Power and attempt to automate said check list.

Sure, a check list could be picked up and used by anyone – with answers manually provided by the person doing the inspection of a given ADF resource. But what if there was a way to have the results given to you a plate and inferring things that aren’t always easy to spot via the Data Factory UI.

Paul uses an ARM template rather than hitting your Data Factory directly, so there’s a little bit more work for you the user, but Paul explains why it’s both necessary and proper.

Comments closed