Press "Enter" to skip to content

Author: Kevin Feasel

Functions To Know By Heart

Kevin Hill has a top ten functions list for database administrators:

3. DBCC Opentran – log file filling up?  Processes blocked and timing out? See if some doofus (hopefully not you) left a transaction open and went to lunch.  If you are lucky you can kill it.   Unlucky and you find an undistributed replicated transaction.

4. Getdate() – Not sure why, but this winds up in probably 80% of the scripts I write.  Not surprisingly, it often brings along it’s little brother, DateDiff.

I’d go with GETUTCDATE() or SYSUTCDATETIME() over GETDATE(); that way, everything gets stored as UTC and you spend less time worrying about time zone or Daylight Savings Time issues.

1 Comment

Finding Candidate Keys

Daniel Hutmacher explains ways to find candidate keys:

Let’s assume we have a temp table heap called #table, with 9 columns and no indexes at all. Some columns are integers, one is a datetime and few are numeric. As I’m writing this post, my test setup has about 14.4 million rows.

In the real world, when you’re investigating a table for primary key candidates, there are a few things you’ll be looking for that are beyond the scope of this post. For instance, it’s a fair assumption that a numeric or float column is not going to be part of a primary key, varchar columns are less probable candidates than integer columns, and so on. Other factors you would take into consideration are naming conventions; column names ending with “ID” and/or columns that you can tell are foreign keys would also probably be good candidates.

It’s useful to think of all the candidate keys, as getting to Boyce-Codd Normal Form or 4th/5th NF involves dealing with all potential primary keys, not just the one you selected.  Daniel’s post gives you several different methods of searching existing data; combine that with domain knowledge and a bit of logic and you have a pretty decent start at finding candidate keys.

Comments closed

Memory-Optimized Hash Indexes And Bucket Counts

Jack Li relates a story about how it’s important to think through bucket count for memory-optimized hash indexes:

I was working with a customer to troubleshoot memory optimized table issues.  In this scenario, our customer uses a memory optimized table variable.  He put 1 million rows of data into the table variable and then process it.  Based on what he said, I tried to come up with a repro to see if I can duplicate the problem.   While troubleshooting that issue, I ran into another issue where I can’t even insert 1 million row into a memory optimized table variable.

My rule of thumb is 30-50% more buckets than expected rows.  That way I can handle some level of unexpected growth while keeping the chance of a hash collision and really slow linked list scan as low as possible.  The official guidance says performance should be “acceptable” with up to 5x rows per buckets, but my experience has been that “acceptable” is a generous term at that point.

Comments closed

SET NOEXEC ON

Andy Kelly shows how to prevent part of a script from running:

 

If we were to hit F5 (or however you execute your TSQL statements in SSMS) without highlighting any statement(s) they would all be executed, one batch after the other. Even if one batch were to fail or we had a THROW in that batch it would fail at that point but execution would continue immediately after the next GO until the end. This is where SET NOEXEC ON comes into play. If I add that at the beginning of the script all succeeding code would not be executed. The statements would only be compiled and not actually run. It would look like this:

SET NOEXEC ON;

PRINT ‘Got Here 1’ ;
GO
PRINT ‘Got Here 2’ ;
GO
PRINT ‘Got Here 3’ ;
GO
PRINT ‘Got Here 4’ ;
GO

This is a useful “accidental F5” protection:  you can put it at the top of your long script to keep from running the whole thing at once.

Comments closed

Parsing Gigantic JSON Text

Jovan Popvic has created a 4.35 GB JSON array:

SQL Server 2016 and Azure SQL Database enable you to parse JSON text and transform it into tabular format. In this post, you might see that JSON functions can handle very large JSON text – up to 4GB.

First, I would need very large JSON document. I’m using TPCH database so I will export the content of lineitem table in a file. JSON can be exported using the bcp.exe program:

My first draft read “Jovan Popovic has created a monster.”  I might go back to that one.  On the plus side, the operation took a lot less time than I had expected, though I’d have to imagine that his SQL Express instance had some decent specs.

Comments closed

Export-DMVInformation Updates

Sander Stad has made changes to his Export-DMVInformation Powershell module:

Last Friday I had the chance to show the Export-DMVInformation module to the Dutch Powershell user group. After the presentation I got a couple of suggestions and wanted to put them in place them into the module.

Changes:

  1. Possibility to execute the module using the pipeline

  2. Get all the databases in one statement by assigning the value “ALL” to the database parameter.

  3. Replaced messages with verbose

Read on for more information, including where you can get the module and its Export-DMVInformation cmdlet.

Comments closed

T-SQL Tuesday 87 Roundup

Matt Gordon has the roundup for T-SQL Tuesday 87:

Finally, as a first time host, I was obviously hoping that this topic would garner some responses, but you never know until you hit that post button whether you’ve selected something of interest to the community or not. Thankfully, this month’s topic picked up views from over 20 countries and over 20 blog responses. The list (with a brief post-by-post commentary from me) is below. Happy reading and thanks again for reading/writing/participating!

Read on for the full list of respondents.

Comments closed

Accepting Risk

Daniel Hutmacher argues that modern companies have reached an inefficient risk equilibrium:

Which brings us to the matter of getting stuff done. Imagine if everything you do has to be approved by a stakeholder and a manager, every line of code you write is peer-reviewed, then tested in a dev test environment, then in an acceptance test environment (which should both contain reasonably fresh, yet scrambled copies of the production data), then approved for deployment by the stakeholder (who ideally should also take time to verify the results), and finally deployed to production by two other people, under a four-eyes principle where no single person can perform any change in production alone. Sprinkle this with a bunch of project meetings, all while leaving a long and winding trail of tickets and documentation.

This is how most development cycles look. Except, you know, the test environments are rarely fresh, the tests aren’t really that thorough, and the peer-review could probably be called a peer-glance at best.

A lot of this depends upon the industry and the likelihood that an outage will cause direct physical harm to people.  I’m personally ambivalent about where the right risk acceptance point is, but Daniel makes a good argument on the “accept more risk” side.

Comments closed

Time Series Aggregation

Steph Locke answers an important question related to time series:

Additive or multiplicative?

It’s important to understand what the difference between a multiplicative time series and an additive one before we go any further.

There are three components to a time series:
trend how things are overall changing
seasonality how things change within a given period e.g. a year, month, week, day
error/residual/irregular activity not explained by the trend or the seasonal value

How these three components interact determines the difference between a multiplicative and an additive time series.

Click through to learn how to spot an additive time series versus a multiplicative.  There is a good bit of very important detail here.

Comments closed