Columnstore Replication

Niko Neugebauer notes that certain columnstore index types are now supported for replication in SQL Server 2016:

At this point you can notice, that from our 3 tables there are only 2 are available for the transactional replication, with the disk-based table with a Clustered Columnstore Index is not being available for the replication.
This means that there are no improvements since SQL Server 2014 for the DWH/BI scenarios in this direction and this is definitely sad.
Well, we can always go a different direction, like in the case of Availability Groups in SQL Server 2014, where readable secondaries are supported only for the Nonclustered Columnstore Indexes. In SQL Server 2016 we can use Nonclustered Columnstore Index even on all columns if needed and get the principle improvements for the Batch Execution Mode.

Notice here that even though we can select the InMemory tables with Clustered Columnstore, there are a couple of additional important settings that needs to be configured to make things function. So clicking through the GUI Wizard will not set things correctly up by default.

The short answer, Niko states, is that you can only replicate non-clustered columnstore indexes at this time.  I can see some use for replicating clustered columnstore tables (warehouse scale-out scenarios, perhaps), but it  wasn’t at the top of my columnstore improvement list.

Parallel Maintenance Tasks

Jonathan Kehayias shows how to parallelize Ola’s maintenance tasks using Service Broker:

With that setup, now we need to build an activation stored procedure to process the messages from the queue and perform the operations generated by Ola’s scripts.  At first I thought that this would be pretty straightforward, but it actually required a fair amount of additional logic to prevent poison messages and queue deactivation from occurring when multiple online index rebuilds were attempted on the same table.  When this occurs the engine raises Error 1912:

Could not proceed with index DDL operation on %S_MSG ‘%.*ls’ because it conflicts with another concurrent operation that is already in progress on the object. The concurrent operation could be an online index operation on the same object or another concurrent operation that moves index pages like DBCC SHRINKFILE.

which causes the XACT_STATE() of the operation to become –1, which only allows the transaction processing the command to be rolled back, leading to a poison message within Service Broker.  So to mitigate against that I had to add in a check against sys.dm_tran_locks for any ALTER_INDEX command to identify if a conflicting lock is currently being held and if it is then we just requeue the message and increment the retry count held within the message so that we can incrementally back-off attempting to run the command each time it encounters a conflicting lock using a WAITFOR.  Since Ola’s scripts log the command type, database name, object name, and command to be executed to the CommandLog table in the master database, all we need to queue in our message is the ID of the command to be processed by Service Broker, and we can get the rest of the pertinent information from the CommandLog table directly.  As a part of each tasks execution in our activation stored procedure we will also update the StartTime and EndTime of the operation for historical trending of execution durations.

This is a clever use of Service Broker, and I’m glad Jonathan shared his code.  Also check out his Pluralsight course for another good use of Service Broker.

Project Parameters In Biml

Kevin Feasel

2016-05-31

Biml

Catherine Wilhelmsen explains why you cannot create project parameters using Biml:

SSIS project parameters do not work the same way as SSIS packages. All project parameters are stored as XML elements in a single XML document for the entire project called Project.params. This is the core reason why packages have a good overwrite story while parameters have a poor overwrite story.

It should be obvious that BimlExpress can’t just overwrite your Project.params file. Of course, BimlExpress would be creating the parameters you specified in your BimlScripts, but it would also be overwriting any parameters you might have created manually. If you are a Biml purist, you might not care about this, because you would be fine with creating all of your project parameters through Biml. Unfortunately, most Biml users are not Biml purists – and even fewer development teams are Biml purists.

It’s a bit of a tricky situation for those cases in which you do want a pure Biml solution, and I’m glad Catherine got down to the bottom of it.

SQL Licenses On Azure

Kenneth Nielsen notes that you can now bring your own SQL Server licenses to Azure marketplace images:

A few days ago, we announced that Microsoft Enterprise customers is now allowed to bring their own SQL Licenses to Azure VMs. This means that if a customer already have a SQL License, this license can be used on SQL Server VM images from Marketplace.

This means that they do no longer need to build their own VM, but instead can just provision a server from the marketplace and use the existing license.

I like this, but I do wonder what percentage of people will use marketplace-created VMs instead of customizing their own builds.

Codd’s Twelve Rules

E.F. Codd’s 12 rules (well, thirteen):

Twelve rules are cited below as part of a test to determine whether a product that is claimed to be fully relational is actually so. Use of the term “fully relational” in this report is slightly more stringent than in my Turing paper (written in 1981). This is partly because vendors in their ads and manuals have translated the term “minimally relational” to “fully relational” and partly because in this report, we are dealing with relational DBMS and not relational systems in general, which would include mere query-reporting systems.

However, the 12 rules tend to explain why full support of the relational model is in the users’ interest. No new requirements are added to the relational model. A grading scheme is later defined and used to measure the degree of fidelity to the relational model.

This particular article seems less important thirty years later, but it was vital in the early days of relational systems to understanding what, precisely, a relational database management system ought to do and—just as importantly—what it ought not do.  It wasn’t enough to slap SQL on top of a hierarchical database platform and call it relational.

Resilient Distributed Datasets

Kevin Feasel

2016-05-30

Spark

Spark is built around the concept of Resilient Distributed Datasets.  If you have not read Matei Zaharia, et al’s paper on the topic, I highly recommend it:

Spark exposes RDDs through a language-integrated API similar to DryadLINQ [31] and FlumeJava [8], where each dataset is represented as an object and transformations are invoked using methods on these objects.

Programmers start by defining one or more RDDs through transformations on data in stable storage (e.g., map and filter). They can then use these RDDs in actions, which are operations that return a value to the application or export data to a storage system. Examples of actions include count (which returns the number of elements in the dataset), collect (which returns the elements themselves), and save (which outputs the dataset to a storage system). Like DryadLINQ, Spark computes RDDs lazily the first time they are used in an action, so that it can pipeline transformations.

In addition, programmers can call a persist method to indicate which RDDs they want to reuse in future operations. Spark keeps persistent RDDs in memory by default, but it can spill them to disk if there is not enough RAM. Users can also request other persistence strategies, such as storing the RDD only on disk or replicating it across machines, through flags to persist. Finally, users can set a persistence priority on each RDD to specify which in-memory data should spill to disk first.

The link also has a video of their initial presentation at NSDI.  Check it out.

Giving Permissions Through Stored Procedures

Erland Sommarskog has a fantastic article on the right (and wrong!) ways of doing stored procedure security:

Before I go on to the main body of this text, I would like to make a short digression about security in general.

Security is often in conflict with other interests in the programming trade. You have users screaming for a solution, and they want it now. At this point, they don’t really care about security, they just want to get their business done. But if you give them a solution that has a hole, and that hole is later exploited, you are the one that will be hung. So as a programmer you always need to have security in mind, and make sure that you play your part right

One common mistake in security is to think “we have this firewall/encryption/whatever, so we are safe”. I like to think of security of something that consists of a number of defence lines. Anyone who has worked with computer systems knows that there are a lot of changes in them, both in their configuration and in the program code. Your initial design may be sound and safe, but as the system evolves, there might suddenly be a security hole and a serious vulnerability in your system.

By having multiple lines of defence you can reduce the risk for this to happen. If a hole is opened, you can reduce the impact of what is possible to do through that hole. An integral part of this strategy is to never grant more permissions than is absolutely necessary. Exactly what this means in this context is something I shall return to.

This is a must-read for anyone interested in rights management in SQL Server.

Happy Memorial Day

Kevin Feasel

2016-05-30

Meta

Curated SQL is on semi-holiday for Memorial Day.  Instead of posting new links to items of interest in the wide world of data, I want to point out a couple longer works that I normally would not be able to talk about given the site’s motif.  If you’re in the office on a slow day, here are a few items that will get you through.

Recursive CTEs

Kevin Feasel

2016-05-27

Syntax

Steve Jones gives a simple recursive CTE example and an important lesson on compounding:

Recursion is an interesting computer science technique that stumps lots of people. When I was learning programming, it seemed that recursion (in Pascal) and pointers (in C), were the weed out topics.

However, they aren’t that bad, and with CTEs, we can write recursion in T-SQL. I won’t cover where this might be used in this post, though I will give you a simple CTE to view.

There are two parts you need: the anchor and the recursive member. These are connected with a UNION ALL. There can be multiple items, but we’ll keep things simple.

You can play query golf and find a way to remove the recursion, but it’s good to know how to create a recursive CTE.  It’s also good to know that you typically do not want recursion in a database process…

SystemThread

Ewald Cress looks at SystemThread:

SystemThread, a class within sqldk.dll, can be considered to be at the root of SQLOS’s scheduling capabilities. While it doesn’t expose much obviously exciting functionality, it encapsulates a lot of the state that is necessary to give a thread a sense of self in SQLOS, serving as the beacon for any code to find its way to an associated SQLOS scheduler etc. I won’t go into much of the SQLOS object hierarchy here, but suffice it to say that everything becomes derivable by knowing one’s SystemThread. As such, this class jumps the gap between a Windows thread and the object-oriented SQLOS.

The simplest way to conceptualise a thread in SQL Server is to think of a spid or connection busy executing a simple query, old skool sysprocesses style. It’s not hip, but it’s close enough to the truth to be useful. This conflates a few things that are separate entities, but it is a good starting point for teasing things apart as needed

This is another dive into internals; prepare your thinking caps.

Categories

May 2018
MTWTFSS
« Apr  
 123456
78910111213
14151617181920
21222324252627
28293031