Kevin Feasel



Steph Locke notes that the R Consortium has agreed to support satRdays:

I’m very pleased to say that the R Consortium agreed to the support the satRday project!

The idea kicked off in November and I was over the moon with the response from the community, then we garnered support before submitting to the Consortium and I must have looped the moon a few times as we had more than 500 responses. Now the R Consortium are supporting us and we can turn all that enthusiasm into action.

This is great.  I’m looking forward to this taking off and being a nice complement to SQL Saturdays in cities.


Kevin Feasel


R, Security

Steph Locke has created an R package to query Troy Hunt’s Have I Been Pwned? site:

The answer in life to the inevitable question of “How can I do that in R?” should be “There’s a package for that”. So when I wanted to query (HIBP) to check whether a bunch of emails had been involved in data breaches and there wasn’t an R package for HIBP, it meant that the responsibility for making one landed on my shoulders. Now, you can see if your accounts are at risk with the R package for, HIBPwned.

This is a nice confluence of two fun topics, so of course I like it.

Getting Started With Hadoop

Kevin Feasel



Ginger Grant has some pointers on getting started with the Hortonworks Data Platform sandbox:

Previously, spinning up a virtual machine meant purchasing software. No more, as there is now an open source application. In the example shown here, the Linux operating system will be installed, you can put any operating system you want on your virtual machine, provided of course you have a license for it. If you don’t feel comfortable installing non-released versions of code like SQL Server 2016, on your pc, a virtual machine is a great way to test it out. You will need to provide your own operating system, but there are trial versions you can use for limited periods of time as well. The open source virtual machine Oracle VM Virtual Box is the only open source version of a virtual machine software. You can download it here. This software is needed prior to installing the Hortonworks Sandbox. Obviously Hortonworks is not the only version of Hadoop available, Cloudera has a Hadoop VM too, which you can download as well. Personally I am not a use fan of the Cloudera Manager, which is why I prefer Hortonworks, but either will work with polybase.

I’m personally a fan of VMware Player for VMs, but either will work well for the task.

Remember Partition Alignment

Kendra Little shows that truncating partitions in SQL Server 2016 requires all indexes be aligned to the partition:

If you have a non-aligned index on the table, you’ll see an error like this:

Msg 3756, Level 16, State 1, Line 1

TRUNCATE TABLE statement failed. Index ‘ix_FirstNameByBirthDate_pt_BirthYear_FirstNameId_nonaligned’ is not partitioned, but table ‘FirstNameByBirthDate_pt’ uses partition function ‘pf_fnbd’. Index and table must use an equivalent partition function.

This isn’t a bug, and it makes total sense from a logical point of view. “Non-aligned” indexes are not partitioned like the base table is– by definition they are either partitioned differently, or not partitioned at all. The chunk of data that you’re trying to truncate isn’t all in an easily identifiable partition that can be quickly marked as “data non grata”. There’s just no way to do a simple truncate when the data’s scattered all around.

Ideally, all indexes on a partitioned table would be partition-aligned, as it makes maintenance a lot easier.  That’s not always possible, though, so keep this in mind.

New In-Memory OLTP Features

Jos de Bruijn points out new In-Memory OLTP features introduced since CTP3:

Large object (LOB) types varchar(max), nvarchar(max) and varbinary(max) are now supported with memory-optimized tables and table types, as well as natively compiled T-SQL modules, and the size limitations mirror that of disk-based tables (2GB limit on LOB values). In addition, you can have a memory-optimized table with a row size > 8060 bytes, even when no column in the table uses a LOB type. There is no run-time limitation on the size of rows or the data in individual columns; this is part of the table definition. Of course, all data does need to fit in memory.

I’m very pleased about this—now I can go back and turn some of my bigger table types into memory-optimized table types.

Using The DAC

Andrea Allred shows us how to use the DAC:

The DAC, what is it? It is the Dedicated Administrator Console. Basically it is a way to connect to your SQL Server when all the other connections are tied up. But it takes a little bit of pre-planning so that you can use it when things go bad with your SQL Server.  Let’s enable it so you can test using it and know that it is there in the future.

This is your “get out of jail free” card when the instance is completely unresponsive.  A small amount of memory is dedicated to the DAC so that even if anything else is locked up, you have a chance to fix the problem short of a reboot.

Generating Test Data

Erik Darling points to a simple site designed to generate test data:

A website I really like to help generate data is over here. It’s really easy to set up data the way you want, and you get a lot of neat customization options for a wide variety of data types.

There are also tools (both free and paid) which help generate SQL data, but this site includes other output formats as well.

Setting Up Azure SQL Database With Powershell

Mike Fal creates and does basic configuration of an Azure SQL Database instance using Powershell:

From a logical standpoint, working with Azure SQL databases is not very different from setting up a SQL Server instance and database in your environment. There are 3 main components we will need to deploy:

  • A server

  • A database

  • A firewall (server or database)

This is the second part in his series.

NUMA With Few Cores

Denny Cherry asks and answers the question of how many NUMA nodes we should use on a server with a large amount of RAM but relatively few cores:

For this example, let’s assume that we have a physical server with 512 Gigs of RAM and two physical NUMA nodes (and two CPU sockets). We have a VM running in that machine which has a low CPU requirement, but a large working set. Because of this we have 4 cores and 360 Gigs of RAM presented to the VM.

The answer is not trivial, making this an interesting question.

Database In Recovery

James Anderson had a database which would drop into In Recovery mode a few times throughout the day:

The database in question wasn’t stuck in recovery, it would slip in and out of the status throughout the day. Normally, I would only ever expect to see a database in recovery during a restore or after a service restart. Once recovery is complete, I would not expect to see the database slip into ‘in recovery’ again. I especially wouldn’t expect a database to keep slipping in and out of recovery.

The answer is a true head-slapper.  Whose head, I’ll leave up to you…


June 2018
« May