Amazon RDS Supports Native Backup And Restore

Brent Ozar reports that Amazon now allows you to use native SQL Server backups for getting data in and out of RDS:

Until now, the single biggest problem has been that both Azure SQL DB and Amazon RDS SQL Server don’t give you access to backup files. If you wanted to get your data out, you were hassling with things like import/export wizards, BCP, or sync apps.

Today, Amazon RDS SQL Server announced support for native backup/restore to Amazon S3.

This is a really, really, really big deal, something Azure SQL DB doesn’t support (and I dearly wish it did). I get even more excited reading this because now Microsoft has to do it in order to remain competitive, and that’ll make Azure SQL DB a much more attractive product for traditional DBAs.

This makes the migration strategy to and from RDS significantly easier.  Brent gives a few examples of how this will help customers.

Parallel Backups Using Powershell Workflows

Sander Stad wants to use Powershell workflows to back up databases in parallel:

It’s possible to use a workflow to execute your backups. You have to take into consideration that there is a downside. If you execute all the backups at once you’ll probably get issues with throughput if you’re dependent on a slow network for example.

You could always add another parameter to filter out specific databases to make sure you execute it as a specific set.

Sander does include his script, so check that out.

Finding Database Restoration Times

Kenneth Fisher wants to know the last time his database was restored:

I frequently need to know where backups went and I restore those backups for operational recovery on a regular basis. Would you believe in 20+ years as a DBA I can count the number of database restores for a disaster on my fingers? (Which is good because taking off your shoes at the office is considered bad form.)

I think the important take-away from this post is that you should leave your shoes on at work.  You don’t know what kind of disgusting things are in that carpet.  Also, read on to learn where to find database restoration history.

Parallel Insert-Select

Arvind Shyamsundar looks at parallel insertion using the INSERT SELECT pattern:

For row store targets, it is important to note that the presence of a clustered index or any additional non-clustered indexes on the target table will disable the parallel INSERT behavior. For example, here is the query plan on the same table with an additional non-clustered index present. The same query takes 287 seconds without a TABLOCK hint and the execution plan is as follows

This post goes into detail on when you can expect parallelism in rowstore and columnstore insertions.  I highly recommend reading it.

CLR Survey

Kevin Feasel



Michael J. Swart wants to know if you’re using CLR in your environment:

CREATE ASSEMBLY supports specifying a CLR assembly using bits, a bit stream that can be specified using regular T-SQL. The full method is described in Deploying CLR Database Objects. In practice, the CREATE ASSEMBLY statement looks something like:

After learning about assembly deployment, check out Michael’s one-question survey.

NodeGroup Performance Issues

Babak Behzad explains potential Hadoop NodeGroup performance bottlenecks:

As can be seen in the logs, the localityWaitFactor value is 1, but the delay that this code causes grows linearly with the number of required containers. Since our DFSIO-large benchmark creates 1,024 files, each 1 GB in size, it requests 1,024 YARN containers. Therefore, the code has to miss at least 1,024 scheduling opportunities until it schedules containers on this (wrongly assumed) OFF_SWITCH node.

But why is this delay enforced? This idea falls into a big area of scheduling research. The Delay Scheduling algorithm was introduced by Matei Zaharia’s EuroSys ’10 paper titled “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”.

That post is a bit deeper than my Hadoop administration comfort level, but if you’re given the task of performance tuning a cluster, this might be one place to look.

UN Voting Clusters

En El Margen charts voting clusters in UN data:

After some more digging, and a suggestion by @theMexIndian I decided to see more in the depth the unvotes database that I wrote about some weeks ago.

This time, amit suggested I do some hierarchical clustering of the votes. So here goes a very dirty first attempt…

Check this out as a case study in data analysis.

Migrating To Azure SQL Database

Kevin Feasel



John Sterrett discusses ways to migrate an on-prem database to Azure SQL Database:

This is great, except for the case I want to talk about today. What if you need to do a live migration with as little downtime (business wants no downtime) as possible with bigger databases. For example, say and existing 50 GB to 500GB database? Your only, option today is a good old friend of mine called transactional replication. You see, you can configure transactional replication and have the snapshot occur and all data in your current production system can be syncing live with your Azure SQL Database until it’s time to cutover which will make your cutover downtime as short as possible.

Below I will give you step by step instructions on how you can configure your subscriber. This would be your Azure SQL Database. The publisher would be your existing production database which could either be on-premise or an Azure VM.

Ah, replication:  the cause of, and solution to, all of life’s problems.  Or something.  Do read the whole thing.

Azure Iaas Disk Differences

Rolf Tesmer looks at whether to use SSDs or Premium Disk for SQL Server instances on Azure IaaS:

To test the throughput I will run a set test with the TempDB database on D:\ (local SSD) and then rerun the test again with the TempDB moved onto F:\ (P30 premium disk).  Between the tests SQL Server is restarted so we’re starting with a clean cache and state.

The test SQL script will create a temporary table and then run a series of insert, update, select and delete queries against that table.  We’ll then capture and record the statistics and time.

The results were interesting; read on to learn more.

Manning’s Equation

John Yagecic has a Shiny app which gives a Monte Carlo analysis of Manning’s Equation:

Monte Carlo analysis is a great way to explore the impact of input variable uncertainty on the results of engineering equations, and with vector variables and distribution and sampling functions at its core, R is a natural platform for this analysis.

Check out his app, which has a link to the code.  Amazingly, this is only 107 lines of code.


December 2018
« Nov