2018-08-13 – Curated SQL

One prominent example is that of high risk applications. Let’s say you’re building a model that helps doctors decide on the preferred treatment for patients. In this case we should not only care about the accuracy of the model, but also about how certain the model is of its prediction. If the uncertainty is too high, the doctor should to take this into account.

Self-driving cars are another interesting example. When the model is uncertain if there is a pedestrian on the road we could use this information to slow the car down or trigger an alert so the driver can take charge.

Uncertainty can also help us with out of data examples. If the model wasn’t trained using examples similar to the sample at hand it might be better if it’s able to say “sorry, I don’t know”. This could have prevented the embarrassing mistake Google photos had when they misclassified African Americans as gorillas. Mistakes like that sometimes happen due to an insufficiently diverse training set.

The last usage of uncertainty, which is the purpose of this post, is as a tool for practitioners to debug their model. We’ll dive into this in a moment, but first, let’s talk about different types of uncertainty.

Interesting argument.

Comments closed

Tic-Tac-Toe In T-SQL

Published 2018-08-13 by Kevin Feasel

Riley Major continues his series on tic-tac-toe:

We could give it some smarts. For example, we know that in our game, we can only choose X or O, so we could put a data constraint on the Play column. And we know that in our game, you can’t play on the same space more than once, so we could put a unique constraint on the combination of GameID and Position. You could even be really creative and enforce our game’s alternating player moves by putting a data constraint on the Player column such that it equaled 1 when (Turn Modulo 2) equaled 1 and 2 otherwise. (Really it wouldn’t need to be a data column at that point, just a calculated column.)

But imposing those restrictions robs our data structure from its raison d’être. It’s no longer a general purpose game play storage system; it only works for our game.

With modern SQL Server, you could have this cake and eat some of it too with filtered indexes and more complex data constraints. If you denormalized a bit and copied your GameTypecolumn from your Games table, you could enforce your single-use positions by creating a unique index on GameID and Position, but when filtered by a particular GameType (type = Tic-Tac-Toe). Similarly, your data constraints on Play and Player could be tailored to each GameType. This will make your table definition complex.

Instead of saddling the data storage itself with all of those rules, we could enforce all of the game mechanics through our data interpretation and manipulation logic. When we saved a game move, we could make sure that an X or O was played and it could check to see whether the specified square was already used. When we analyzed a game to determine a win, all of the criteria could be housed in that consuming query. But this flexible design isn’t done inflicting its complexity on us.

Riley covers a number of T-SQL features in the process of this post.

Comments closed

Writing ssisUnit Tests With C#

Published 2018-08-13 by Kevin Feasel

Bartosz Ratajczyk shows us how to create ssisUnit tests in MSTest with C#:

In the post about using MSTest framework to execute ssisUnit tests, I used parts of the ssisUnit API model. If you want, you can write all your tests using this model, and this post will guide you through the first steps. I will show you how to write one of the previously prepared XML tests using C# and (again) MSTest.

Why MSTest? Because I don’t want to write some application that will contain all the tests I want to run, display if they pass or not. When I write the MSTest tests, I can run them using the Test Explorer in VS, using a command line, or in TFS.

UIs are great for learning how to do things and for one-off actions, but writing code scales much better in terms of time.

Comments closed

Last Observation Carried Forward In T-SQL

Published 2018-08-13 by Kevin Feasel

Pawan Khowal shows one example of implementing Last Observation Carried Forward in T-SQL:

A very close friend given this to me. In this puzzle you have to fill the price of SKU & Color Id for missing months. Note that SKU & Color Id should be considered as a business unit. So you have to set the previous value available to the missing month. Please check out the sample input and the expected output. In this solution I have not considered any performance considerations.

Included is one solution, though there are others.

Comments closed

Configuring tempdb

Published 2018-08-13 by Kevin Feasel

Jeff Mlakar looks at some basic guidelines for tempdb and shows how to configure this database:

The basic guidelines are:

Each tempdb data file should be the same initial size
Autogrowth to tempdb files should be an explicit value in MB instead of a percentage. Choose a reasonable value based on the workload. Ex. 64MB, 128MB, 256MB, etc.
The number of tempdb files should be 1 per logical processor core up to 8. At that point the performance should be monitored and if more tempdb files are needed they should be added in sets of 4.
Ideally the tempdb files are sized up to the max they will need and never have to autogrow.
Use trace flags 1117 and 1118 for versions of SQL Server < 2016. In SQL Server 2016 these trace flags are defaults.
- Trace flag 1117: when a file in the filegroup meets the autogrow threshold, all files in the filegroup grow together
- Trace flag 1118: Removes most single page allocations on the server, reducing contention on the SGAM page. TLDR; no more mixed extents – use the whole page.

There are some good pieces of advice here, and Jeff includes a great example of a terrible setup.

Comments closed

Making A Readable Presentation Template

Published 2018-08-13 by Kevin Feasel

Meagan Longoria has some advice for presentation templates:

The title text is 36pt Segoe UI Light, the subheading text is 24pt Segoe UI, and the speaker info text is 14 pt Segoe UI.

Those font sizes alone make it very hard to read from the back of even the smaller rooms at the conference.

In addition to being too small, the gray text for the speaker info doesn’t have enough contrast from the white background. We want to get a contrast ratio of at least 4.5:1 (but 7:1 would be better). The contrast ratio for these colors is 4.0.

While sans serif fonts are generally thought to be easier to read in presentations, it’s better to use fonts with a stroke width that is not too thin – not necessarily wider characters, but thicker lines that make up each letter. So Segoe UI Light would not be my first choice for a title font, but Segoe UI or Segoe UI Bold might be ok.

Also, the red used on the right half of the slide is VERY bright for an element that is purely decorative, to the point that it might be distracting for some people. And the reason we need to squish our title into two lines of too-small text is because that giant red shape takes up half the page. What is more important: a “pretty” red shape to make our slide look snazzy or being able to clearly read the title of the presentation?

There’s a lot along these lines, and it’s great food for thought. Meagan includes a set of recommendations at the end, so be sure to catch those.

Comments closed

Day: August 13, 2018

Using Uncertainty For Model Interpretation

Tic-Tac-Toe In T-SQL

Writing ssisUnit Tests With C#

Last Observation Carried Forward In T-SQL

Configuring tempdb

Making A Readable Presentation Template