In the last post, we explored a couple of examples of using Extended Events to enhance T-SQL error handling. There was some potential there. But a hard-coded SPID was necessary: we couldn’t use the code examples for anything automated. It was cumbersome, too. Let’s change that, shall we?
To make the code easier to work with, I moved most of it into three stored procs: one each to create an XEvent session, get the XEvent session data, and drop the XEvent session. There’s also a table type. This will negate the need to declare a temp table over and over. The four objects can be created in any database you choose. I opted to create them in [tempdb]. The code for each is below in the four tabs.
This is a very interesting solution.
As I covered in the first post, streaming data is continuously generated; therefore, you need to specify bounds when processing data to make your result set deterministic. Some SQL statements operate on individual rows and have natural bounds, such as a continuous filter that evaluates each row based upon a defined SQL WHERE clause. However, SQL statements that process data across rows need to have set bounds, such as calculating the average of particular column. The mechanism that provides these bounds is a window.
Windows are important because they define the bounds for which you want your query to operate. The starting bound is usually the current row that Amazon Kinesis Analytics is processing, and the window defines the ending bound.
Windows are required with any query that works across rows, because the in-application stream is unbounded and windows provide a mechanism to bind the result set and make the query deterministic. Analytics supports three types of windows: tumbling, sliding, and custom.
The concepts here are very similar to Azure’s Stream Analytics.
When flush ‘table’ operation is triggered, all the regions belonging to that table will flush independently. Once the HFile corresponding to a region is flushed, it records the max sequence id in metadata and notifies the WAL corresponding to the regionserver. WAL maintains a mapping table for regions and their corresponding flushed sequence id’s. When the HBase cluster restarts, the hMaster will distribute flushed sequence id’s per region to the recovery threads splitting the WAL, so that they can skip the edits which have already been persisted in HFiles.
This is particularly important for clusters which frequently spin up and down, a feature of Platform-as-a-Service solutions like HDInsight.
EBS is attached to the AWS compute node as a fully-functional filesystem (similar to an attached SSD on an on-premise node), and Impala makes use of several filesystem features to deliver higher throughput and lower latency. These features include:
- HDFS short-circuit reads to bypass HDFS and read files directly from the filesystem
- OS buffer cache to read frequently accessed files directly from the cache instead of fetching it again
- Fixed-cost file renames through metadata operations
In contrast, S3 is an object store that is accessed over the network. However, with S3, throughput is better than simple network-attached storage because of its dedicated, high-performance networks. In Cloudera’s internal benchmark testing (detailed below), on an r3.2xlarge, we saw a consistent throughput of about 100MB/s. Furthermore, in S3, there is currently no equivalent to HDFS short-circuit reads. Move/rename operations for data stored in S3 is a copy followed by a delete, while a file move on HDFS is a metadata operation—which is usually problematic for ETL workloads, as they create large number of small files that are typically moved.
It looks like EBS is a solid choice for many workloads.
Azure Automation is a cloud service in Microsoft Azure which let you schedule execution of PowerShell cmdlets and PowerShell workflows. Azure Automation uses the concept of runbooks to execute a set of repeatable, repetitive tasks via PowerShell. Consistency in execution, reduction of errors, and of course saving time, are all key objectives – which makes DBAs and system admins happy, eh?
This is a higher-level discussion including some good tips on the product.
Let’s summarize the situation. We have 2 query plans in the Query Store and the most recent one is also the current plan in the plan cache that is being used for all new executions of this procedure. But that plan is bad for all but .1% of the values we may pass in to the procedure. The previous plan in the Query Store is a much better plan overall and that is the one we want to ensure is used regardless of the value passed in. As such we go ahead and force the plan using the provided tools or TSQL which sets the is_forced_plan to 1 for the 1st plan in sys.query_store_plan. As a simplified explanation this action invokes a recompile and the current plan (which was bad) is replaced with a new plan that is based on the one we forced. That new plan now becomes the current one in the cache and is now the one in the Query Store that all new statistics are tied to as well.
Most people would think that if they forced a particular plan that was in the Query Store and it was marked as forced we would in fact be using that very same plan identified by the plan_id and query_plan_hash which is tied to the plan we forced. Keep in mind that if there were problems with the recompile such as it was missing an index that was there when the original plan was created we would get an error which would be listed in the force failure columns and a different plan would obviously need to be used. Errors aside most of the time when we force a plan it gets recompiled and we end up with the same plan as that which we forced. If that plan is the same as the original one we forced it will have the same query_plan_hash and thus the same plan_id. All future executions will now use that plan and all statistics will be tied to it as well. This is exactly what we would expect once we forced a plan in the Query Store.
If you’re looking at using Query Store, definitely read this post.
I’ve used Glenn’s DMV scripts for years but always found them tedious to execute because there are about 70 individual scripts that either query instance or retrieve database information. Glenn did make it easier for you by creating Excel templates to save the information in.
There are separate scripts for each version of SQL Server that updated every month. Glenn only updates the versions for 2012 to 2016 with new features. The scripts are very well documented and even contain hints about how to solve some issues.
Click through for more information on how to install this Powershell module.
I decided to put a list of the migration methods that can be useful for migrating to Azure SQLDatabase. By all means it is not complete and if you have any suggestions to expand it – do not be shy.
The current list of the ways that I am considering is here:
SQL Server Management Studio (SSMS)
BACPAC + SSMS/Portal/Powershell
SQL Azure Migration Wizard (SAMW)
SQL Server Data Tools (Visual Studio) + BCP/SSIS
Azure Data Factory
Read on for the details on each method.
COUNT(*) needs to return the exact number of rows. EXISTS only needs to answer a question like:
“Are there any rows at all?”
In other words, EXISTS can short-circuit after having found the first matching row. If your client code (e.g. written in Java or in PL/SQL, or any other client language) needs to know something like:
“Did actors called “Wahlberg” play in any films at all?”
Lukas shows how it works in Oracle and Postgres; the result is still basically the same for SQL Server.
Trustworthy database setting is an easy solution for some problems in the server, especially cross database access. However, this setting has also a security problem in some scenarios.
For more on this setting, Erland Sommarskog’s essay on permissions granting has one of the best explanations I’ve read.