2016-09-19 – Curated SQL

In the last post, we explored a couple of examples of using Extended Events to enhance T-SQL error handling. There was some potential there. But a hard-coded SPID was necessary: we couldn’t use the code examples for anything automated. It was cumbersome, too. Let’s change that, shall we?

To make the code easier to work with, I moved most of it into three stored procs: one each to create an XEvent session, get the XEvent session data, and drop the XEvent session. There’s also a table type. This will negate the need to declare a temp table over and over. The four objects can be created in any database you choose. I opted to create them in [tempdb]. The code for each is below in the four tabs.

This is a very interesting solution.

Comments closed

Kinesis Analytics

Published 2016-09-19 by Kevin Feasel

Ryan Nienhuis shows how to implement Amazon Kinesis Analytics:

As I covered in the first post, streaming data is continuously generated; therefore, you need to specify bounds when processing data to make your result set deterministic. Some SQL statements operate on individual rows and have natural bounds, such as a continuous filter that evaluates each row based upon a defined SQL WHERE clause. However, SQL statements that process data across rows need to have set bounds, such as calculating the average of particular column. The mechanism that provides these bounds is a window.

Windows are important because they define the bounds for which you want your query to operate. The starting bound is usually the current row that Amazon Kinesis Analytics is processing, and the window defines the ending bound.

Windows are required with any query that works across rows, because the in-application stream is unbounded and windows provide a mechanism to bind the result set and make the query deterministic. Analytics supports three types of windows: tumbling, sliding, and custom.

The concepts here are very similar to Azure’s Stream Analytics.

Comments closed

Improving HBase Cluster Restart Time

Published 2016-09-19 by Kevin Feasel

Nitin Verma explains how to re-create an HBase cluster a bit faster:

When flush ‘table’ operation is triggered, all the regions belonging to that table will flush independently. Once the HFile corresponding to a region is flushed, it records the max sequence id in metadata and notifies the WAL corresponding to the regionserver. WAL maintains a mapping table for regions and their corresponding flushed sequence id’s. When the HBase cluster restarts, the hMaster will distribute flushed sequence id’s per region to the recovery threads splitting the WAL, so that they can skip the edits which have already been persisted in HFiles.

This is particularly important for clusters which frequently spin up and down, a feature of Platform-as-a-Service solutions like HDInsight.

Comments closed

S3 Or EBS?

Published 2016-09-19 by Kevin Feasel

Devadutta Ghat, et al, compare Amazon S3 versus Elastic Block Storage (EBS) on the basis of cost and Apache Impala performance:

EBS is attached to the AWS compute node as a fully-functional filesystem (similar to an attached SSD on an on-premise node), and Impala makes use of several filesystem features to deliver higher throughput and lower latency. These features include:

HDFS short-circuit reads to bypass HDFS and read files directly from the filesystem

OS buffer cache to read frequently accessed files directly from the cache instead of fetching it again

Fixed-cost file renames through metadata operations

In contrast, S3 is an object store that is accessed over the network. However, with S3, throughput is better than simple network-attached storage because of its dedicated, high-performance networks. In Cloudera’s internal benchmark testing (detailed below), on an r3.2xlarge, we saw a consistent throughput of about 100MB/s. Furthermore, in S3, there is currently no equivalent to HDFS short-circuit reads. Move/rename operations for data stored in S3 is a copy followed by a delete, while a file move on HDFS is a metadata operation—which is usually problematic for ETL workloads, as they create large number of small files that are typically moved.

It looks like EBS is a solid choice for many workloads.

Comments closed

Azure Automation

Published 2016-09-19 by Kevin Feasel

Melissa Coates explains Azure Automation:

Azure Automation is a cloud service in Microsoft Azure which let you schedule execution of PowerShell cmdlets and PowerShell workflows. Azure Automation uses the concept of runbooks to execute a set of repeatable, repetitive tasks via PowerShell. Consistency in execution, reduction of errors, and of course saving time, are all key objectives – which makes DBAs and system admins happy, eh?

This is a higher-level discussion including some good tips on the product.

Comments closed

Query Store And Forcing Plans

Published 2016-09-19 by Kevin Feasel

Andy Kelly explains that forcing query plans using Query Store doesn’t always result in exactly the same plan being used:

Let’s summarize the situation. We have 2 query plans in the Query Store and the most recent one is also the current plan in the plan cache that is being used for all new executions of this procedure. But that plan is bad for all but .1% of the values we may pass in to the procedure. The previous plan in the Query Store is a much better plan overall and that is the one we want to ensure is used regardless of the value passed in. As such we go ahead and force the plan using the provided tools or TSQL which sets the is_forced_plan to 1 for the 1st plan in sys.query_store_plan. As a simplified explanation this action invokes a recompile and the current plan (which was bad) is replaced with a new plan that is based on the one we forced. That new plan now becomes the current one in the cache and is now the one in the Query Store that all new statistics are tied to as well.

Most people would think that if they forced a particular plan that was in the Query Store and it was marked as forced we would in fact be using that very same plan identified by the plan_id and query_plan_hash which is tied to the plan we forced. Keep in mind that if there were problems with the recompile such as it was missing an index that was there when the original plan was created we would get an error which would be listed in the force failure columns and a different plan would obviously need to be used. Errors aside most of the time when we force a plan it gets recompiled and we end up with the same plan as that which we forced. If that plan is the same as the original one we forced it will have the same query_plan_hash and thus the same plan_id. All future executions will now use that plan and all statistics will be tied to it as well. This is exactly what we would expect once we forced a plan in the Query Store.

If you’re looking at using Query Store, definitely read this post.

Comments closed

Automating DMV Scripts

Published 2016-09-19 by Kevin Feasel

Sander Stad has a Powershell script to automate using Glenn Berry’s excellent DMV queries:

I’ve used Glenn’s DMV scripts for years but always found them tedious to execute because there are about 70 individual scripts that either query instance or retrieve database information. Glenn did make it easier for you by creating Excel templates to save the information in.
There are separate scripts for each version of SQL Server that updated every month. Glenn only updates the versions for 2012 to 2016 with new features. The scripts are very well documented and even contain hints about how to solve some issues.

Click through for more information on how to install this Powershell module.

Comments closed

Migrating To Azure SQL Database

Published 2016-09-19 by Kevin Feasel

Niko Neugebauer is building a compendium of methods to migrate an on-prem database to Azure SQL Database:

I decided to put a list of the migration methods that can be useful for migrating to Azure SQLDatabase. By all means it is not complete and if you have any suggestions to expand it – do not be shy.

The current list of the ways that I am considering is here:

SQL Server Management Studio (SSMS)
BACPAC + SSMS/Portal/Powershell
SQL Azure Migration Wizard (SAMW)
SQL Server Data Tools (Visual Studio) + BCP/SSIS
Azure Data Factory
Transactional Replication
Linked Server

Read on for the details on each method.

Comments closed

COUNT Versus EXISTS

Published 2016-09-19 by Kevin Feasel

Lukas Eder explains COUNT versus EXISTS:

COUNT(*) needs to return the exact number of rows. EXISTS only needs to answer a question like:

“Are there any rows at all?”

In other words, EXISTS can short-circuit after having found the first matching row. If your client code (e.g. written in Java or in PL/SQL, or any other client language) needs to know something like:

“Did actors called “Wahlberg” play in any films at all?”

Lukas shows how it works in Oracle and Postgres; the result is still basically the same for SQL Server.

Comments closed

Trustworthiness

Published 2016-09-19 by Kevin Feasel

Dennes Torres warns us of the dangers of setting a database’s Trustworthy flag on:

Trustworthy database setting is an easy solution for some problems in the server, especially cross database access. However, this setting has also a security problem in some scenarios.

For more on this setting, Erland Sommarskog’s essay on permissions granting has one of the best explanations I’ve read.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Day: September 19, 2016

Error Handling With Extended Events, Part 2

Kinesis Analytics

Improving HBase Cluster Restart Time

S3 Or EBS?

Azure Automation

Query Store And Forcing Plans

Automating DMV Scripts

Migrating To Azure SQL Database

COUNT Versus EXISTS

Trustworthiness