Press "Enter" to skip to content

Author: Kevin Feasel

The Downside Of Oversizing

Aaron Bertrand shows why you might not want to oversize VARCHAR columns by too much:

Now, whether you go by the old standard or the new one, you do have to support the possibility that someone will use all the characters allowed. Which means you have to use 254 or 320 characters. But what I’ve seen people do is not bother researching the standard at all, and just assume that they need to support 1,000 characters, 4,000 characters, or even beyond.

So let’s take a look at what happens when we have tables with an e-mail address column of varying size, but storing the exact same data:

This is a good argument against automatically using VARCHAR(8000) (much less MAX) when creating columns.

Comments closed

Checking Backup Encryption Size Differences

Tracy Boggiano has a script to check whether your backup file sizes are larger or smaller when they’re encrypted:

I had a recent project to enable backup encryption on all our servers.  Then question from the storage team came up will this required additional space.  Well by then I had already enabled in all our test servers so I wrote a query that would compare the average size of backups before encryption to after encryption.  Keep in mind we do keep only two weeks of history in our backup tables so this is a fair comparison.  If you don’t have maintenance tasks to clean up your backup history then you should have backup_start_time to the where clauses to get more accurate numbers and setup a maintenance tasks to keep your msdb backup history in check.

Unfortunately, Tracy leaves us in suspense regarding whether they did increase.

Comments closed

Duplicate Key Error On DBCC CLONEDATABASE

Erin Stellato shows an edge case when you have a rather old database you’re trying to clone:

If you’ve been using DBCC CLONEDATABASE at all, you might have run into a cannot insert duplicate key error (or something similar) when trying to clone a database:

Database cloning for ‘YourDatabase’ has started with target as ‘COPY_YourDatabase’.
Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object ‘sys.sysschobjs’ with unique index ‘clst’. The duplicate key value is (1977058079).

If you do some searching, you’ll probably end up at this Connect item: DBCC DATABASECLONE fails on sys.sysowners.

The Connect item states that the problem exists because of user objects in model.  That’s not the case here.

I’m working with a database created in SQL Server 2000…now running on SQL Server 2016.

This isn’t very likely to pop up for most places (I hope!), but it’s good to know.

Comments closed

Finding Clustered Columnstore Index Candidates

Sunil Agarwal has a script that helps you find potential clustered columnstore index candidates:

Most of us understand that clustered columnstore index can typically provide 10x data compression and can speed up query performance up to 100x. While this sounds all so good, the question is how do I know which tables in my database could potentially benefit from CCI? For a traditional DW scenario with star schema, the FACT table is an obvious choice to consider. However, many workloads including DW have grown organically and it is not trivial to identify tables that could benefit from CCI. So the question is how can I quickly identify a subset of tables suitable for CCI in my workload?

Interestingly, the answer lies in leveraging the DMVs that collect data access patterns in each of the tables. The following DMV query provides a first order approximation to identify list of tables suitable for CCI. It queries the HEAP or the rowstore Clustered index using DMV sys.dm_db_index_operational_stats to identify the access pattern on the base rowstore table to identify tables that meet the criteria listed in the comments below:

Read on for the script, which has a sensible set of criteria.

Comments closed

The Most Useless SQL Server Feature

Adam Machanic put out a poll on Twitter, asking for the most useless SQL Server feature:

It was at this point that I realized just how many candidates there are for “most useless” things lying around the product. So I decided to create my own tweet. I asked for the most useless feature, anytime between version 7.0 (which I would call the beginning of SQL Server’s “modern era”) and now. I received quite a few suggestions, and so I have decided to catalog them here—along with a bit of personal commentary on each one.

The list that follows is mostly unordered and culled straight from what I received on Twitter. Hopefully I haven’t missed anything due to Twitter’s weird threading and “priority” mechanisms. And please let me know in the comments if your favorite useless feature is missing, or you’d like to add a comment and/or argument about one of these. Perhaps we can find some way to turn these dark and ugly corners into things of beauty? Well, we shall see…

I almost completely agree with Adam’s opinions on this long list.  I’d emphasize, though, that In-Memory OLTP is by no means useless.

1 Comment

Power BI Premium Released

Dustin Ryan reports that Power BI Premium is now generally available:

Microsoft stated that Power BI Premium would be GA late Q2 of 2017, which could mean nothing else aside from June. Well, today is the day that Power BI Premium, along with Power BI Report Server, are generally available.

Lucky for me, I have a user on a tenant with Power BI Premium capacity available for me to use. When you first log into your tenant after Power BI Premium capacity has been purchased, you are presented with the following welcome screen which includes a link to learn more about Premium capacity.

The Power BI team continues to be busy.

Comments closed

When Snapshots Begin

Kendra Little explains when a transaction really begins when you are in the snapshot isolation level:

  • 00.000 – Session A sets its isolation level to snapshot

  • 00.001 – Session A explicitly begins a transaction with BEGIN TRAN

  • 00.002 – Session A starts a WAITFOR command for 15 seconds

  • 10.000 – Before the WAITFOR completes, Session B inserts rows into dbo.Table

  • 15.001 – Session A starts a SELECT from dbo.Table, which returns the rows that Session B inserted

This seems wrong, because many of us commonly say things like, “in Snapshot Isolation level, all statements see data consistent with the beginning of the transaction.”

But in this case, Session B inserted the rows after Session A began its transaction using Snapshot Isolation level. So why did Session A see those rows?

Kendra explains the nuance well, so read the whole thing.

Comments closed

Security And Zookeeper

Michael Han describes a few methods you can use to tighten up (or rather, introduce) security in ZooKeeper:

Four Letter Words (acronym as 4lw) is a very popular feature of the Apache ZooKeeper project. In a nutshell, 4lw is a set of commands that you can use to interact with a ZooKeeper ensemble through a shell interface. Because it’s simple and easy to use, lots of ZooKeeper monitoring solutions are built on top of 4lw.

The simplicity of 4lw comes at a cost: the design did not originally consider security, there is no built in support for authentication and access control. Any user that has access to the ZooKeeper client port can send commands to the ensemble. The 4lw commands are read only commands: no actions can be performed. However, they can be computing intensive, and sending too many of them would effectively create a DOS attack that prevents the ensemble’s normal operation.

Read on for details.

Comments closed

Moving Docker Containers

Andrew Pruski shows how to change the default location for Docker containers on Windows:

There’s a switch that you can use when starting up the docker service that will allow you to specify the container/image backend. That switch is -g

Now, I’ve gone the route of not altering the existing service but creating a new one with the -g switch. Mainly because I’m testing and like rollback options but also because I found it easier to do it this way.

Read the whole thing.

Comments closed

Flattening JSON With Purrr

Steph Locke shows how to use purrr to write functional style code in R:

And… et voila! A multi-language dataset with the language identified and the sentiment scored using purrr for easier to read code.

Using purrr with APIs makes code nicer and more elegant as it really helps interact with hierarchies from JSON objects. I feel much better about this code now!

Purrr is something I really want to dig into for reasons just like this.

Comments closed