Press "Enter" to skip to content

Author: Kevin Feasel

Getting Started With Hadoop

Ginger Grant has some pointers on getting started with the Hortonworks Data Platform sandbox:

Previously, spinning up a virtual machine meant purchasing software. No more, as there is now an open source application. In the example shown here, the Linux operating system will be installed, you can put any operating system you want on your virtual machine, provided of course you have a license for it. If you don’t feel comfortable installing non-released versions of code like SQL Server 2016, on your pc, a virtual machine is a great way to test it out. You will need to provide your own operating system, but there are trial versions you can use for limited periods of time as well. The open source virtual machine Oracle VM Virtual Box is the only open source version of a virtual machine software. You can download it here. This software is needed prior to installing the Hortonworks Sandbox. Obviously Hortonworks is not the only version of Hadoop available, Cloudera has a Hadoop VM too, which you can download as well. Personally I am not a use fan of the Cloudera Manager, which is why I prefer Hortonworks, but either will work with polybase.

I’m personally a fan of VMware Player for VMs, but either will work well for the task.

Comments closed

Remember Partition Alignment

Kendra Little shows that truncating partitions in SQL Server 2016 requires all indexes be aligned to the partition:

If you have a non-aligned index on the table, you’ll see an error like this:

Msg 3756, Level 16, State 1, Line 1

TRUNCATE TABLE statement failed. Index ‘ix_FirstNameByBirthDate_pt_BirthYear_FirstNameId_nonaligned’ is not partitioned, but table ‘FirstNameByBirthDate_pt’ uses partition function ‘pf_fnbd’. Index and table must use an equivalent partition function.

This isn’t a bug, and it makes total sense from a logical point of view. “Non-aligned” indexes are not partitioned like the base table is– by definition they are either partitioned differently, or not partitioned at all. The chunk of data that you’re trying to truncate isn’t all in an easily identifiable partition that can be quickly marked as “data non grata”. There’s just no way to do a simple truncate when the data’s scattered all around.

Ideally, all indexes on a partitioned table would be partition-aligned, as it makes maintenance a lot easier.  That’s not always possible, though, so keep this in mind.

Comments closed

New In-Memory OLTP Features

Jos de Bruijn points out new In-Memory OLTP features introduced since CTP3:

Large object (LOB) types varchar(max), nvarchar(max) and varbinary(max) are now supported with memory-optimized tables and table types, as well as natively compiled T-SQL modules, and the size limitations mirror that of disk-based tables (2GB limit on LOB values). In addition, you can have a memory-optimized table with a row size > 8060 bytes, even when no column in the table uses a LOB type. There is no run-time limitation on the size of rows or the data in individual columns; this is part of the table definition. Of course, all data does need to fit in memory.

I’m very pleased about this—now I can go back and turn some of my bigger table types into memory-optimized table types.

Comments closed

Using The DAC

Andrea Allred shows us how to use the DAC:

The DAC, what is it? It is the Dedicated Administrator Console. Basically it is a way to connect to your SQL Server when all the other connections are tied up. But it takes a little bit of pre-planning so that you can use it when things go bad with your SQL Server.  Let’s enable it so you can test using it and know that it is there in the future.

This is your “get out of jail free” card when the instance is completely unresponsive.  A small amount of memory is dedicated to the DAC so that even if anything else is locked up, you have a chance to fix the problem short of a reboot.

Comments closed

NUMA With Few Cores

Denny Cherry asks and answers the question of how many NUMA nodes we should use on a server with a large amount of RAM but relatively few cores:

For this example, let’s assume that we have a physical server with 512 Gigs of RAM and two physical NUMA nodes (and two CPU sockets). We have a VM running in that machine which has a low CPU requirement, but a large working set. Because of this we have 4 cores and 360 Gigs of RAM presented to the VM.

The answer is not trivial, making this an interesting question.

Comments closed

Database In Recovery

James Anderson had a database which would drop into In Recovery mode a few times throughout the day:

The database in question wasn’t stuck in recovery, it would slip in and out of the status throughout the day. Normally, I would only ever expect to see a database in recovery during a restore or after a service restart. Once recovery is complete, I would not expect to see the database slip into ‘in recovery’ again. I especially wouldn’t expect a database to keep slipping in and out of recovery.

The answer is a true head-slapper.  Whose head, I’ll leave up to you…

Comments closed

Optimizing OR Clauses

Daniel Hutmacher looks at different ways of optimizing queries with multiple conditionals and different parameters:

The SQL Server query optimizer can find interesting ways to tackle seemingly simple operations that can be hard to optimize. Consider the following query on a table with two indexes, one on (a), the other on (b):

SELECT a, b
FROM #data
WHERE a<=10 OR b<=10000;

The basic problem is that we would really want to use both indexes in a single query.

We get to see a few different versions of the query as well as the execution plans which result.

Comments closed

Using The OUTPUT Clause

Steve Jones shows how to use the OUTPUT clause with an INSERT statement:

I got asked a question about the OUTPUT clause recently and realized I didn’t remember the syntax. I’ve rarely used this, so I had to look it up and thought this would be a good basic post.

The idea with OUTPUT is that the data from the inserted and deleted tables can be output from the INSERT statement, outside of your triggers. This is the same data, but you can access it in the insert.

The format is

INSERT xxx OUTPUT yyyy INTO @zzz VALUES (or SELECT) mmmm

If I had one thing I could change about OUTPUT, I’d like to be able to output directly into variables for those cases in which I know I’m only going to get one result (or maybe I only care about one arbitrary result in a set).

Comments closed