Press "Enter" to skip to content

Author: Kevin Feasel

Parallel Backups Using Powershell Workflows

Sander Stad wants to use Powershell workflows to back up databases in parallel:

It’s possible to use a workflow to execute your backups. You have to take into consideration that there is a downside. If you execute all the backups at once you’ll probably get issues with throughput if you’re dependent on a slow network for example.

You could always add another parameter to filter out specific databases to make sure you execute it as a specific set.

Sander does include his script, so check that out.

Comments closed

Finding Database Restoration Times

Kenneth Fisher wants to know the last time his database was restored:

I frequently need to know where backups went and I restore those backups for operational recovery on a regular basis. Would you believe in 20+ years as a DBA I can count the number of database restores for a disaster on my fingers? (Which is good because taking off your shoes at the office is considered bad form.)

I think the important take-away from this post is that you should leave your shoes on at work.  You don’t know what kind of disgusting things are in that carpet.  Also, read on to learn where to find database restoration history.

Comments closed

Parallel Insert-Select

Arvind Shyamsundar looks at parallel insertion using the INSERT SELECT pattern:

For row store targets, it is important to note that the presence of a clustered index or any additional non-clustered indexes on the target table will disable the parallel INSERT behavior. For example, here is the query plan on the same table with an additional non-clustered index present. The same query takes 287 seconds without a TABLOCK hint and the execution plan is as follows

This post goes into detail on when you can expect parallelism in rowstore and columnstore insertions.  I highly recommend reading it.

Comments closed

NodeGroup Performance Issues

Babak Behzad explains potential Hadoop NodeGroup performance bottlenecks:

As can be seen in the logs, the localityWaitFactor value is 1, but the delay that this code causes grows linearly with the number of required containers. Since our DFSIO-large benchmark creates 1,024 files, each 1 GB in size, it requests 1,024 YARN containers. Therefore, the code has to miss at least 1,024 scheduling opportunities until it schedules containers on this (wrongly assumed) OFF_SWITCH node.

But why is this delay enforced? This idea falls into a big area of scheduling research. The Delay Scheduling algorithm was introduced by Matei Zaharia’s EuroSys ’10 paper titled “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”.

That post is a bit deeper than my Hadoop administration comfort level, but if you’re given the task of performance tuning a cluster, this might be one place to look.

Comments closed

Migrating To Azure SQL Database

John Sterrett discusses ways to migrate an on-prem database to Azure SQL Database:

This is great, except for the case I want to talk about today. What if you need to do a live migration with as little downtime (business wants no downtime) as possible with bigger databases. For example, say and existing 50 GB to 500GB database? Your only, option today is a good old friend of mine called transactional replication. You see, you can configure transactional replication and have the snapshot occur and all data in your current production system can be syncing live with your Azure SQL Database until it’s time to cutover which will make your cutover downtime as short as possible.

Below I will give you step by step instructions on how you can configure your subscriber. This would be your Azure SQL Database. The publisher would be your existing production database which could either be on-premise or an Azure VM.

Ah, replication:  the cause of, and solution to, all of life’s problems.  Or something.  Do read the whole thing.

Comments closed

Azure Iaas Disk Differences

Rolf Tesmer looks at whether to use SSDs or Premium Disk for SQL Server instances on Azure IaaS:

To test the throughput I will run a set test with the TempDB database on D:\ (local SSD) and then rerun the test again with the TempDB moved onto F:\ (P30 premium disk).  Between the tests SQL Server is restarted so we’re starting with a clean cache and state.

The test SQL script will create a temporary table and then run a series of insert, update, select and delete queries against that table.  We’ll then capture and record the statistics and time.

The results were interesting; read on to learn more.

Comments closed

Manning’s Equation

John Yagecic has a Shiny app which gives a Monte Carlo analysis of Manning’s Equation:

Monte Carlo analysis is a great way to explore the impact of input variable uncertainty on the results of engineering equations, and with vector variables and distribution and sampling functions at its core, R is a natural platform for this analysis.

Check out his app, which has a link to the code.  Amazingly, this is only 107 lines of code.

Comments closed

Intro To Data Analytics

Stacia Varga has a follow-up to her Gentle Introduction to Data Analytics:

Q: How is data analytics on SAS different from R support for SQL2016?

A: I’ve not used SAS (although I did go to their campus to teach MDX to their engineers once upon a time… beautiful campus!), so I can’t say with any specificity. SAS also has multiple products so it would be difficult to describe differences. In general – and please understand there may be something new in SAS that I don’t know about – I think the difference is that R in SQL 2016 allows you to run functions on data inside the database, thereby leveraging database resources and also server resources for parallelization. I’d love to get input from readers to expand on this topic.

I can confirm that SAS has a beautiful campus.

Comments closed