Press "Enter" to skip to content

Author: Kevin Feasel

A Recommendation Against Using TOTALYTD

Marco Russo recommends against using the TOTALYTD function in DAX:

You have been warned. I don’t like the code above. If you want to apply filters, use CALCULATE. Maybe the code written using CALCULATE is verbose and boring, but it’s easier to read. If you are looking for an enjoyable reading for your free time, you can choose a novel rather than playing with an obscure DAX syntax.

So, why writing a blog post about this? The reason is that TOTALYTD has the ability to accept a scalar value as an argument instead of (or after) the filter. This is the same additional optional parameter accepted by DATESYTD in case the fiscal year does not end on December 31st. The problem is that the third argument of TOTALYTD is a filter if you use a predicate or a table expression, and it’s a marker for end of year in case it’s a scalar value. Confusing enough? Ok, let me present some example.

Read the whole thing.

Comments closed

Storing Wait Stats In tempdb

Max Vernon has a script which loads a bunch of wait stats definitions and then collects wait stat details:

Performance troubleshooting should begin with capturing wait stats so we can understand where SQL Server is busy. The script below captures wait stats into a table in tempdb; the script should be ran via a SQL Server Agent job or some other scheduling mechanism.

I like the definitions that Max provides.  My only recommendation would be to store this data someplace a bit more permanent than tempdb.

1 Comment

Shuffling Data And Zipping Results In T-SQL

Phil Factor continues his series on pseudonymization:

The problems come with uncommon values. If you are pseudonymizing a medical database that is required for research purposes on people with potentially embarrassing diseases, and it appears on the dark web, anyone with a rare or unusual surname or first-name comes up on the list, so the shuffle doesn’t help the privacy of Fortescue Ceresole, or whatever his name may be.

If you are spoofing data entirely, you don’t necessarily have this problem because your constructed value will have no relationship to the original value. If it comes from a list of common names or if you randomly create a name ‘Thomas’, it will have no relationship to the original names in the database as long as you did things correctly and shuffle the list. Although a Markov string can produce an identical name that is uncommon, it can be eliminated from the list by an outer join with the original data.

After you shuffle data, you ‘zip’ it. Zipping lists is something you come across in procedural programming, and Linq has a good example. A .net array has an order, and all you are doing is to join by the order of the element in the list. If you randomize that order, you get a shuffle.

Read on for an example using the AdventureWorks Person.Person table.

Comments closed

Improving Power Query Performance By Reducing Variables

Reza Rad shows us an extreme case of variable fatigue in Power Query:

This is a sample Power Query file, which in that I do a very simple transformation. The transformation is adding one to the existing number. However, in this sample, we are doing it over thousands of steps! one step at a time, we are adding thousands to a number. The main reason to do it this way is to show you what is the performance you get when you have too many variables (or let’s say steps) in Power Query.

Reza reduces the number of steps and turns a 15-minute operation into one which finishes in under a second.

Comments closed

Tips For SQL Server On Linux + Docker

Kellyn Pot’vin-Gorman has a set of tips for working with SQL Server on Linux using Docker:

Easy Download

Docker is available for Mac and Windows with a simple installation by the defaults.

  1. Download the correct installation for your OS type.
  2. Run the installer and keep all the defaults, choosing Linux containers, not Windows containers
  3. Reboot Windows workstations- Done.

Incredibly Simple MSSQL Container Install

Microsoft has done a great job of creating a very small, (maybe a bit too small, but we’ll get into that later…) image that can be used to create a running Linux container with SQL Server.  This grants to student a great opportunity to simulate much of what it would be like to work on a real Linux server.

Click through to read the whole thing.  And Kellyn’s not kidding about the image missing basic packages.

Comments closed

Using Uncertainty For Model Interpretation

Yoel Zeldes and Inbar Naor explain how uncertainty can help you understand your models better:

One prominent example is that of high risk applications. Let’s say you’re building a model that helps doctors decide on the preferred treatment for patients. In this case we should not only care about the accuracy of the model, but also about how certain the model is of its prediction. If the uncertainty is too high, the doctor should to take this into account.

Self-driving cars are another interesting example. When the model is uncertain if there is a pedestrian on the road we could use this information to slow the car down or trigger an alert so the driver can take charge.

Uncertainty can also help us with out of data examples. If the model wasn’t trained using examples similar to the sample at hand it might be better if it’s able to say “sorry, I don’t know”. This could have prevented the embarrassing mistake Google photos had when they misclassified African Americans as gorillas. Mistakes like that sometimes happen due to an insufficiently diverse training set.

The last usage of uncertainty, which is the purpose of this post, is as a tool for practitioners to debug their model. We’ll dive into this in a moment, but first, let’s talk about different types of uncertainty.

Interesting argument.

Comments closed

Tic-Tac-Toe In T-SQL

Riley Major continues his series on tic-tac-toe:

We could give it some smarts. For example, we know that in our game, we can only choose X or O, so we could put a data constraint on the Play column. And we know that in our game, you can’t play on the same space more than once, so we could put a unique constraint on the combination of GameID and Position. You could even be really creative and enforce our game’s alternating player moves by putting a data constraint on the Player column such that it equaled 1 when (Turn Modulo 2) equaled 1 and 2 otherwise. (Really it wouldn’t need to be a data column at that point, just a calculated column.)

But imposing those restrictions robs our data structure from its raison d’être. It’s no longer a general purpose game play storage system; it only works for our game.

Instead of saddling the data storage itself with all of those rules, we could enforce all of the game mechanics through our data interpretation and manipulation logic. When we saved a game move, we could make sure that an X or O was played and it could check to see whether the specified square was already used. When we analyzed a game to determine a win, all of the criteria could be housed in that consuming query. But this flexible design isn’t done inflicting its complexity on us.

Riley covers a number of T-SQL features in the process of this post.

Comments closed

Writing ssisUnit Tests With C#

Bartosz Ratajczyk shows us how to create ssisUnit tests in MSTest with C#:

In the post about using MSTest framework to execute ssisUnit tests, I used parts of the ssisUnit API model. If you want, you can write all your tests using this model, and this post will guide you through the first steps. I will show you how to write one of the previously prepared XML tests using C# and (again) MSTest.

Why MSTest? Because I don’t want to write some application that will contain all the tests I want to run, display if they pass or not. When I write the MSTest tests, I can run them using the Test Explorer in VS, using a command line, or in TFS.

UIs are great for learning how to do things and for one-off actions, but writing code scales much better in terms of time.

Comments closed

Last Observation Carried Forward In T-SQL

Pawan Khowal shows one example of implementing Last Observation Carried Forward in T-SQL:

A very close friend given this to me. In this puzzle you have to fill the price of SKU & Color Id for missing months. Note that SKU & Color Id should be considered as a business unit. So you have to set the previous value available to the missing month. Please check out the sample input and the expected output. In this solution I have not considered any performance considerations.

Included is one solution, though there are others.

Comments closed

Configuring tempdb

Jeff Mlakar looks at some basic guidelines for tempdb and shows how to configure this database:

The basic guidelines are:

  • Each tempdb data file should be the same initial size

  • Autogrowth to tempdb files should be an explicit value in MB instead of a percentage. Choose a reasonable value based on the workload. Ex. 64MB, 128MB, 256MB, etc.

  • The number of tempdb files should be 1 per logical processor core up to 8. At that point the performance should be monitored and if more tempdb files are needed they should be added in sets of 4.

  • Ideally the tempdb files are sized up to the max they will need and never have to autogrow.

  • Use trace flags 1117 and 1118 for versions of SQL Server < 2016. In SQL Server 2016 these trace flags are defaults.

    • Trace flag 1117: when a file in the filegroup meets the autogrow threshold, all files in the filegroup grow together

    • Trace flag 1118: Removes most single page allocations on the server, reducing contention on the SGAM page. TLDR; no more mixed extents – use the whole page.

There are some good pieces of advice here, and Jeff includes a great example of a terrible setup.

Comments closed