satRdays

Kevin Feasel

2016-03-25

R

Steph Locke notes that the R Consortium has agreed to support satRdays:

I’m very pleased to say that the R Consortium agreed to the support the satRday project!

The idea kicked off in November and I was over the moon with the response from the community, then we garnered support before submitting to the Consortium and I must have looped the moon a few times as we had more than 500 responses. Now the R Consortium are supporting us and we can turn all that enthusiasm into action.

This is great.  I’m looking forward to this taking off and being a nice complement to SQL Saturdays in cities.

HIBPwned

Kevin Feasel

2016-03-25

R, Security

Steph Locke has created an R package to query Troy Hunt’s Have I Been Pwned? site:

The answer in life to the inevitable question of “How can I do that in R?” should be “There’s a package for that”. So when I wanted to query HaveIBeenPwned.com (HIBP) to check whether a bunch of emails had been involved in data breaches and there wasn’t an R package for HIBP, it meant that the responsibility for making one landed on my shoulders. Now, you can see if your accounts are at risk with the R package for HaveIBeenPwned.com, HIBPwned.

This is a nice confluence of two fun topics, so of course I like it.

Getting Started With Hadoop

Kevin Feasel

2016-03-25

Hadoop

Ginger Grant has some pointers on getting started with the Hortonworks Data Platform sandbox:

Previously, spinning up a virtual machine meant purchasing software. No more, as there is now an open source application. In the example shown here, the Linux operating system will be installed, you can put any operating system you want on your virtual machine, provided of course you have a license for it. If you don’t feel comfortable installing non-released versions of code like SQL Server 2016, on your pc, a virtual machine is a great way to test it out. You will need to provide your own operating system, but there are trial versions you can use for limited periods of time as well. The open source virtual machine Oracle VM Virtual Box is the only open source version of a virtual machine software. You can download it here. This software is needed prior to installing the Hortonworks Sandbox. Obviously Hortonworks is not the only version of Hadoop available, Cloudera has a Hadoop VM too, which you can download as well. Personally I am not a use fan of the Cloudera Manager, which is why I prefer Hortonworks, but either will work with polybase.

I’m personally a fan of VMware Player for VMs, but either will work well for the task.

Remember Partition Alignment

Kendra Little shows that truncating partitions in SQL Server 2016 requires all indexes be aligned to the partition:

If you have a non-aligned index on the table, you’ll see an error like this:

Msg 3756, Level 16, State 1, Line 1

TRUNCATE TABLE statement failed. Index ‘ix_FirstNameByBirthDate_pt_BirthYear_FirstNameId_nonaligned’ is not partitioned, but table ‘FirstNameByBirthDate_pt’ uses partition function ‘pf_fnbd’. Index and table must use an equivalent partition function.

This isn’t a bug, and it makes total sense from a logical point of view. “Non-aligned” indexes are not partitioned like the base table is– by definition they are either partitioned differently, or not partitioned at all. The chunk of data that you’re trying to truncate isn’t all in an easily identifiable partition that can be quickly marked as “data non grata”. There’s just no way to do a simple truncate when the data’s scattered all around.

Ideally, all indexes on a partitioned table would be partition-aligned, as it makes maintenance a lot easier.  That’s not always possible, though, so keep this in mind.

New In-Memory OLTP Features

Jos de Bruijn points out new In-Memory OLTP features introduced since CTP3:

Large object (LOB) types varchar(max), nvarchar(max) and varbinary(max) are now supported with memory-optimized tables and table types, as well as natively compiled T-SQL modules, and the size limitations mirror that of disk-based tables (2GB limit on LOB values). In addition, you can have a memory-optimized table with a row size > 8060 bytes, even when no column in the table uses a LOB type. There is no run-time limitation on the size of rows or the data in individual columns; this is part of the table definition. Of course, all data does need to fit in memory.

I’m very pleased about this—now I can go back and turn some of my bigger table types into memory-optimized table types.

Using The DAC

Andrea Allred shows us how to use the DAC:

The DAC, what is it? It is the Dedicated Administrator Console. Basically it is a way to connect to your SQL Server when all the other connections are tied up. But it takes a little bit of pre-planning so that you can use it when things go bad with your SQL Server.  Let’s enable it so you can test using it and know that it is there in the future.

This is your “get out of jail free” card when the instance is completely unresponsive.  A small amount of memory is dedicated to the DAC so that even if anything else is locked up, you have a chance to fix the problem short of a reboot.

Generating Test Data

Erik Darling points to a simple site designed to generate test data:

A website I really like to help generate data is over here. It’s really easy to set up data the way you want, and you get a lot of neat customization options for a wide variety of data types.

There are also tools (both free and paid) which help generate SQL data, but this site includes other output formats as well.

Categories

March 2016
MTWTFSS
« Feb Apr »
 123456
78910111213
14151617181920
21222324252627
28293031