Press "Enter" to skip to content

Author: Kevin Feasel

Looking At Columnstore In vNext

Dmitry Pilugin looks at non-clustered columnstore indexes in vNext:

Note: Notice, that I created a varchar(max) column in a table, that was done deliberately, because I wanted to test if LOB columns are supported in vNext. Unfortunatelly, when I tried to create a nonclustered Columnstore index, I got an error:
The statement failed. Column ‘Note’ has a data type that cannot participate in a columnstore index. Omit column ‘Note’.

However, if you create a clustered Columnstore, you will succeed! Though, you will get the following warning in CTP 1.2:

Warning: Using Lob types (NVARCHAR(MAX), VARCHAR(MAX), and VARBINARY(MAX)) with Clustered Columnstore Index is in public preview. Do not use it on production data without backup during public preview period.

I think it is great that we can now have a LOB varchar columns in Columnstore, because it was a blocker for some scenarios, at least I have seen some of those.

There are a few interesting findings here, so check it out.

Comments closed

Functions To Know By Heart

Kevin Hill has a top ten functions list for database administrators:

3. DBCC Opentran – log file filling up?  Processes blocked and timing out? See if some doofus (hopefully not you) left a transaction open and went to lunch.  If you are lucky you can kill it.   Unlucky and you find an undistributed replicated transaction.

4. Getdate() – Not sure why, but this winds up in probably 80% of the scripts I write.  Not surprisingly, it often brings along it’s little brother, DateDiff.

I’d go with GETUTCDATE() or SYSUTCDATETIME() over GETDATE(); that way, everything gets stored as UTC and you spend less time worrying about time zone or Daylight Savings Time issues.

1 Comment

Finding Candidate Keys

Daniel Hutmacher explains ways to find candidate keys:

Let’s assume we have a temp table heap called #table, with 9 columns and no indexes at all. Some columns are integers, one is a datetime and few are numeric. As I’m writing this post, my test setup has about 14.4 million rows.

In the real world, when you’re investigating a table for primary key candidates, there are a few things you’ll be looking for that are beyond the scope of this post. For instance, it’s a fair assumption that a numeric or float column is not going to be part of a primary key, varchar columns are less probable candidates than integer columns, and so on. Other factors you would take into consideration are naming conventions; column names ending with “ID” and/or columns that you can tell are foreign keys would also probably be good candidates.

It’s useful to think of all the candidate keys, as getting to Boyce-Codd Normal Form or 4th/5th NF involves dealing with all potential primary keys, not just the one you selected.  Daniel’s post gives you several different methods of searching existing data; combine that with domain knowledge and a bit of logic and you have a pretty decent start at finding candidate keys.

Comments closed

Memory-Optimized Hash Indexes And Bucket Counts

Jack Li relates a story about how it’s important to think through bucket count for memory-optimized hash indexes:

I was working with a customer to troubleshoot memory optimized table issues.  In this scenario, our customer uses a memory optimized table variable.  He put 1 million rows of data into the table variable and then process it.  Based on what he said, I tried to come up with a repro to see if I can duplicate the problem.   While troubleshooting that issue, I ran into another issue where I can’t even insert 1 million row into a memory optimized table variable.

My rule of thumb is 30-50% more buckets than expected rows.  That way I can handle some level of unexpected growth while keeping the chance of a hash collision and really slow linked list scan as low as possible.  The official guidance says performance should be “acceptable” with up to 5x rows per buckets, but my experience has been that “acceptable” is a generous term at that point.

Comments closed

SET NOEXEC ON

Andy Kelly shows how to prevent part of a script from running:

 

If we were to hit F5 (or however you execute your TSQL statements in SSMS) without highlighting any statement(s) they would all be executed, one batch after the other. Even if one batch were to fail or we had a THROW in that batch it would fail at that point but execution would continue immediately after the next GO until the end. This is where SET NOEXEC ON comes into play. If I add that at the beginning of the script all succeeding code would not be executed. The statements would only be compiled and not actually run. It would look like this:

SET NOEXEC ON;

PRINT ‘Got Here 1’ ;
GO
PRINT ‘Got Here 2’ ;
GO
PRINT ‘Got Here 3’ ;
GO
PRINT ‘Got Here 4’ ;
GO

This is a useful “accidental F5” protection:  you can put it at the top of your long script to keep from running the whole thing at once.

Comments closed

Parsing Gigantic JSON Text

Jovan Popvic has created a 4.35 GB JSON array:

SQL Server 2016 and Azure SQL Database enable you to parse JSON text and transform it into tabular format. In this post, you might see that JSON functions can handle very large JSON text – up to 4GB.

First, I would need very large JSON document. I’m using TPCH database so I will export the content of lineitem table in a file. JSON can be exported using the bcp.exe program:

My first draft read “Jovan Popovic has created a monster.”  I might go back to that one.  On the plus side, the operation took a lot less time than I had expected, though I’d have to imagine that his SQL Express instance had some decent specs.

Comments closed

Export-DMVInformation Updates

Sander Stad has made changes to his Export-DMVInformation Powershell module:

Last Friday I had the chance to show the Export-DMVInformation module to the Dutch Powershell user group. After the presentation I got a couple of suggestions and wanted to put them in place them into the module.

Changes:

  1. Possibility to execute the module using the pipeline

  2. Get all the databases in one statement by assigning the value “ALL” to the database parameter.

  3. Replaced messages with verbose

Read on for more information, including where you can get the module and its Export-DMVInformation cmdlet.

Comments closed

T-SQL Tuesday 87 Roundup

Matt Gordon has the roundup for T-SQL Tuesday 87:

Finally, as a first time host, I was obviously hoping that this topic would garner some responses, but you never know until you hit that post button whether you’ve selected something of interest to the community or not. Thankfully, this month’s topic picked up views from over 20 countries and over 20 blog responses. The list (with a brief post-by-post commentary from me) is below. Happy reading and thanks again for reading/writing/participating!

Read on for the full list of respondents.

Comments closed

Accepting Risk

Daniel Hutmacher argues that modern companies have reached an inefficient risk equilibrium:

Which brings us to the matter of getting stuff done. Imagine if everything you do has to be approved by a stakeholder and a manager, every line of code you write is peer-reviewed, then tested in a dev test environment, then in an acceptance test environment (which should both contain reasonably fresh, yet scrambled copies of the production data), then approved for deployment by the stakeholder (who ideally should also take time to verify the results), and finally deployed to production by two other people, under a four-eyes principle where no single person can perform any change in production alone. Sprinkle this with a bunch of project meetings, all while leaving a long and winding trail of tickets and documentation.

This is how most development cycles look. Except, you know, the test environments are rarely fresh, the tests aren’t really that thorough, and the peer-review could probably be called a peer-glance at best.

A lot of this depends upon the industry and the likelihood that an outage will cause direct physical harm to people.  I’m personally ambivalent about where the right risk acceptance point is, but Daniel makes a good argument on the “accept more risk” side.

Comments closed