Press "Enter" to skip to content

Author: Kevin Feasel

Filtered Statistics

William Wolf shows us the value of filtered statistics:

Wolf only had 700 complaints, but 166,900 records were estimated for return. He is looking much worse than reality shows.

So, what is happening is that there are 3 possible employee results for complaints. It is rather simple. CE is taking the total amount of records(500,701) and dividing by 3 assuming that all 3 will have roughly the same amount of records. We see that along with the estimated number of records being the same, the execution plan operators are the same. For such a variation in amount of records, there must be a better way.

I rarely create filtered statistics, in part because I don’t have a good idea of exactly which values people will use when searching.  But one slight change to Wolf’s scenario might help:  having a filter where name = Sunshine and a filter where name <> Sunshine (or name is null).  That might help a case where there’s extreme skew with one value and the rest are much closer to uniformly distributed.

Comments closed

Temporal Memory-Optimized Tables

Ned Otter describes how hybrid disk + memory-optimized temporal tables differ from on-disk temporal tables:

As changes are made to rows in the temporal memory-optimized table, before being transferred to the history table on disk, they are first migrated to an internal memory-optimized staging table. That means when you query the “history table”, you could be retrieving rows from both the on-disk history table, and internal staging table. Because no custom indexing was possible on the internal staging table, there could be performance implications when executing queries against historical data. Microsoft addressed these potential performance issues in SQL 2016 SP1 (detailed in this CAT blog post).

The internal staging table only gets flushed to the on-disk history table when it reaches 8% of the size of the temporal table. Given the current capacities of Windows Server 2016 (24TB memory), it’s now possible to have very large memory-optimized tables. 8% of one of those large memory-optimized tables could be quite large, which will affect query performance, if predicates don’t match available indexes.

Read on for some sobering thoughts on the topic.

Comments closed

Automating Installation Of SQL Server

Nate Johnson has a script he uses to automate installation of SQL Server on a new server:

We can then use this file in the ConfigurationFile argument of setup.exe from the SQL Server install media.  To put a little more color on that: the .ini file is really just a collection of command-line arguments to setup.exe​; you could also list them all out in-line, but that would be tedious and silly.  Here’s a couple major selling points of creating your own config file:

  1. Slipstream updates (SP’s, CU’s), instead of having it go out to MSFT update servers (or *aghast* sticking with the original RTM bits, you heathen you!)

  2. Specify drive letters / default file locations: sure, this may be considered old-hat if you’re running super slick storage, but I still find it makes management a bit easier if I know where my MDFs, LDFs, TempDB, & backups will always be.

  3. Take advantage of 2016’s better TempDB setup options (# files, size & growth)

Read the whole thing.

Comments closed

Unit Testing Kafka Streams

Anuj Saxena shows us how to build mocks for streams in Kafka Streams:

Here, we are using Kafka streams in our applications. We are done with the implementation but again, the most important thing left is testing. This blog is about how to test the application we have created. For this, I’ll be taking the sample app I created in my previous blog for both high-level DSL and low-level processor API.

Traditionally, we test our Kafka application with an integration test for which we need to create a ZooKeeper and a real Kafka broker. After that, we need a mock producer and mock consumer for our app to produce the inputs and receive the outputs. That makes it such a big hassle just to test our app. Testing it for real scenarios and for the actual integration test, this is needed without a doubt.

Click through for an example.

Comments closed

Sympathy For The Part-Timer

John Mount wants us to think about part-time users:

The second point I think is particularly interesting. It means:

An R user who does not consider themselves an expert programmer could be maintaining code that they understand, but could not be expected to create from scratch.

Or:

Let’s have some sympathy for the part-time R user.

This is the point we will emphasize in our new example.

Read on for a particular example.  I think this is good advice to generalize:  write your code to make it as easy as possible for “part-time” users.  This applies to custom code you write as well, as unless you are constantly in a particular part of the code base, you’ll forget the details later and have the same problems that a part-timer would have working with a different language.

Comments closed

Dynamic Date Dimensions In Power BI

Ginger Grant shows how to build a dynamic date dimension in Power BI using either DAX or M:

Recently I needed to create a date dimension for a Power BI model as there was not one in the source database. There are two different ways that I could do this, using DAX from the Modeling Tab within the Data View or using M via the Query Editor window.  As a general rule, when it is possible data manipulation should be done in M as it offers a greater level of compression.  In this case though I am using a function in DAX, which is not the same as creating a calculated column.

Read on to see code examples for each method, as well as Ginger’s analysis.

Comments closed

More Annoying MSDTC Problems

Jeff Mlakar walks through some advanced MSDTC troubleshooting:

Sometimes when a VM is cloned from a template (VMware or Hyper-V) the CID can be duplicated between the two machines. Evidence of this can be seen in the Event Viewer (Event ID 4101). This means the two MSDTC services will not be able to communicate with each other.

A possible error message is:

The Microsoft Distributed Transaction Coordinator (MS DTC) has cancelled the distributed transaction

The MSDTC feature of the Windows operating system requires unique CID values to ensure that MSDTC functionality between computers works correctly. Disk duplicate images of Windows installations must have unique CID values or MSDTC functionality may be impaired. This can occur when using virtual hard disks to deploy an operating system to a virtual machine.

This burned me in the past.  Jeff has several scenarios that he walks us through, so if you’re using the Distributed Transaction Coordinator, definitely check this out.

Comments closed

Limiting Color Usage On Dashboard Charts

Jesse Gorter explains why you shouldn’t overwhelm your dashboard chart users with colors:

In this example we use a signal color for the past too. Do you notice how the usage of green distracts from the current week which is a red? This suggest we are doing great overall even though at this time, we are doing not so great. It is up to you to decide what you want to communicate. If you are a sports team showing the rank during the season, only the current position would be important. In sales, having 30 weeks of outstanding sales above the target and the current week selling slightly under, it would make sense to show the signal color for the past.

Not to mention making it easier for people with CVD to read your report, something with which the red-green scheme does not do great.

Comments closed

Left Versus Right Joins

Denis Gobo doesn’t like RIGHT JOIN:

Do you use RIGHT JOINs? I myself rarely use a RIGHT JOIN, I think in the last 17 years or so I have only used a RIGHT JOIN once or twice. I think that RIGHT JOINs confuse people who are new to databases, everything that you can do with a RIGHT JOIN, you can also do with a LEFT JOIN, you just have to flip the query around

So why did I use a RIGHT JOIN then?

Don’t be lazy; switch out those right joins.  The trick is that for every RIGHT JOIN statement, there is an equivalent statement which does not use RIGHT JOIN.  The percentage of the time that you might benefit from RIGHT JOIN is so low that the fixed costs of mentally processing what’s going on tend to overwhelm the slight benefit of that style of join.

Comments closed

Going Back From The Cloud

Arun Sirpal notes that you can take a cloud database back to on-premises:

The Challenge: I am going to write about a way to move from Azure SQL Database (Platform as a service) back to a local SQL Server. I did encounter errors on the way but more importantly I have written how to avoid/solve them.

Another key point I made sure that there were no connections to the database when doing the below as I didn’t want in-flight data movement whilst doing it. If you can’t do this, then you should create a copy of the database and work from that.

It’s not a trivial operation, but Arun does walk us through the steps.

Comments closed