Press "Enter" to skip to content

Author: Kevin Feasel

Finding Clustered Columnstore Index Candidates

Sunil Agarwal has a script that helps you find potential clustered columnstore index candidates:

Most of us understand that clustered columnstore index can typically provide 10x data compression and can speed up query performance up to 100x. While this sounds all so good, the question is how do I know which tables in my database could potentially benefit from CCI? For a traditional DW scenario with star schema, the FACT table is an obvious choice to consider. However, many workloads including DW have grown organically and it is not trivial to identify tables that could benefit from CCI. So the question is how can I quickly identify a subset of tables suitable for CCI in my workload?

Interestingly, the answer lies in leveraging the DMVs that collect data access patterns in each of the tables. The following DMV query provides a first order approximation to identify list of tables suitable for CCI. It queries the HEAP or the rowstore Clustered index using DMV sys.dm_db_index_operational_stats to identify the access pattern on the base rowstore table to identify tables that meet the criteria listed in the comments below:

Read on for the script, which has a sensible set of criteria.

Comments closed

The Most Useless SQL Server Feature

Adam Machanic put out a poll on Twitter, asking for the most useless SQL Server feature:

It was at this point that I realized just how many candidates there are for “most useless” things lying around the product. So I decided to create my own tweet. I asked for the most useless feature, anytime between version 7.0 (which I would call the beginning of SQL Server’s “modern era”) and now. I received quite a few suggestions, and so I have decided to catalog them here—along with a bit of personal commentary on each one.

The list that follows is mostly unordered and culled straight from what I received on Twitter. Hopefully I haven’t missed anything due to Twitter’s weird threading and “priority” mechanisms. And please let me know in the comments if your favorite useless feature is missing, or you’d like to add a comment and/or argument about one of these. Perhaps we can find some way to turn these dark and ugly corners into things of beauty? Well, we shall see…

I almost completely agree with Adam’s opinions on this long list.  I’d emphasize, though, that In-Memory OLTP is by no means useless.

1 Comment

Power BI Premium Released

Dustin Ryan reports that Power BI Premium is now generally available:

Microsoft stated that Power BI Premium would be GA late Q2 of 2017, which could mean nothing else aside from June. Well, today is the day that Power BI Premium, along with Power BI Report Server, are generally available.

Lucky for me, I have a user on a tenant with Power BI Premium capacity available for me to use. When you first log into your tenant after Power BI Premium capacity has been purchased, you are presented with the following welcome screen which includes a link to learn more about Premium capacity.

The Power BI team continues to be busy.

Comments closed

When Snapshots Begin

Kendra Little explains when a transaction really begins when you are in the snapshot isolation level:

  • 00.000 – Session A sets its isolation level to snapshot

  • 00.001 – Session A explicitly begins a transaction with BEGIN TRAN

  • 00.002 – Session A starts a WAITFOR command for 15 seconds

  • 10.000 – Before the WAITFOR completes, Session B inserts rows into dbo.Table

  • 15.001 – Session A starts a SELECT from dbo.Table, which returns the rows that Session B inserted

This seems wrong, because many of us commonly say things like, “in Snapshot Isolation level, all statements see data consistent with the beginning of the transaction.”

But in this case, Session B inserted the rows after Session A began its transaction using Snapshot Isolation level. So why did Session A see those rows?

Kendra explains the nuance well, so read the whole thing.

Comments closed

Security And Zookeeper

Michael Han describes a few methods you can use to tighten up (or rather, introduce) security in ZooKeeper:

Four Letter Words (acronym as 4lw) is a very popular feature of the Apache ZooKeeper project. In a nutshell, 4lw is a set of commands that you can use to interact with a ZooKeeper ensemble through a shell interface. Because it’s simple and easy to use, lots of ZooKeeper monitoring solutions are built on top of 4lw.

The simplicity of 4lw comes at a cost: the design did not originally consider security, there is no built in support for authentication and access control. Any user that has access to the ZooKeeper client port can send commands to the ensemble. The 4lw commands are read only commands: no actions can be performed. However, they can be computing intensive, and sending too many of them would effectively create a DOS attack that prevents the ensemble’s normal operation.

Read on for details.

Comments closed

Moving Docker Containers

Andrew Pruski shows how to change the default location for Docker containers on Windows:

There’s a switch that you can use when starting up the docker service that will allow you to specify the container/image backend. That switch is -g

Now, I’ve gone the route of not altering the existing service but creating a new one with the -g switch. Mainly because I’m testing and like rollback options but also because I found it easier to do it this way.

Read the whole thing.

Comments closed

Flattening JSON With Purrr

Steph Locke shows how to use purrr to write functional style code in R:

And… et voila! A multi-language dataset with the language identified and the sentiment scored using purrr for easier to read code.

Using purrr with APIs makes code nicer and more elegant as it really helps interact with hierarchies from JSON objects. I feel much better about this code now!

Purrr is something I really want to dig into for reasons just like this.

Comments closed

Cross-Platform Powershell Remoting

Anthony Nocentino shows how to enter Powershell sessions using OpenSSH-basted remoting:

Nothing special here, simple syntax, but the seasoned PowerShell remoting pro will notice that we’re using a new parameter here -HostName. Normally on Windows PowerShell you have the -ComputerName parameter. Now, I don’t know exactly why this is different, but perhaps the PowerShell team needed a way to differentiate between OpenSSH and WinRM based remoting. Further, Enter-PSSession now has a new parameter -SSHTransport which at the moment doesn’t seem to do much since remoting cmdlets currently use OpenSSH by default. But if you read the code comments here, it looks like WinRM will be the default and we can use this switch parameter to specify SSH as the transport.

Once we execute this command, you’ll have a command prompt to the system that passed as a parameter to -HostName. The prompt below indicates you’re on a remote system by putting the server name you’re connected to in square brackets then your normal PowerShell prompt. That’s it, you now have a remote shell. Time to get some work done on that server, eh? Want to get out of the session, just type exit.

It’s interesting to see how well Microsoft is integrating Linux support into Powershell (and vice versa, but that’s a different post).

Comments closed

Using OStress

Nikhilesh Patel explains how to use OStress to generate artificial database loads for stress testing:

OStress is a Microsoft tool comes with RML utilities package and it uses to stress SQL Server. This is especially useful when you want to troubleshoot SQL Server while SQL Server is under heavy load.

It is a free tool for SQL Server developers and DBAs. It is designed to assist with performance stress testing of T-SQL queries and routines. The tool automatically collects metrics to help you determine whether your queries will perform under load, and what kind of resource strain they put on a server. In short, it also allows putting a serious load on your database.

OStress isn’t the easiest thing in the world to set up, but it works well.

Comments closed

Change Detection With Hashes

Nigel Meakins shows how to use HashBytes to roll your own change detection:

So this all sounds very promising as a way of tracking changes to our Data Warehouse data, for purposes such as extracting deltas, inserts and updates to Type I and II dimensions and so forth. It doesn’t have any show-stopping overhead for the hashing operations for the sizes of data typically encountered and storage isn’t going to be an issue. It is native to T-SQL so we can rerun our hash value generation in the engine where our data resides rather than having to push through SSIS or some other tool to generate this for us. Algorithms are universal and as such will give us the same values wherever used for the same bytes of input. Let’s go back to the basic idea for a minute and consider how we implement this.

This is particularly useful in cases where you have metadata columns you don’t much care about (e.g., last modified time).  I do recommend using CONCAT or CONCAT_WS (if you’re on SQL Server 2017) to do string concatenation, though; it’d remove the need for util.CastAsNVarchar and possibly more.

Comments closed