Press "Enter" to skip to content

Curated SQL Posts

Durable Keys in Type 2 Dimensions

Martin Schoombee takes us through the idea of durable keys:

Also called an immutable or persisted key (I like durable better), a durable key is nothing more than a surrogate key (i.e. integer value or nonsensical number) used to identify a dimension member (company, employee, etc.) uniquely in a type-2 dimension. Confusing enough? It’s easier to explain with an example…

When I read Martin’s post, I kind of got it but said to myself, “How would I run this type of query more efficiently?” The thing that wasn’t clicking came from another article on the topic: you add the durable key to the fact as well as the current key. That way, you can join back to the Company dimension on CompanyKey if you want to get the company data as of the fact date, or you can join on DurableCompanyKey (and CurrentRecord = 1) to get the latest company data regardless of the fact date. Now that this is clear, I like the strategy a lot.

1 Comment

dbachecks Against Azure SQL Databases

Jess Pomfret takes us through running dbachecks on an Azure SQL Database:

Last week I gave a presentation at Data South West on dbachecks and dbatools. One of the questions I got was whether you could run dbachecks against Azure SQL Databases, to which I had no idea. I always try to be prepared for potential questions that might come up, but I had only been thinking about on-premises environments and hadn’t even considered the cloud.  The benefit is this gives me a great topic for a blog post.

Click through for the answer.

Comments closed

Parallel Inserts into Temp Tables

Erik Darling explains the pre-conditions for parallel insertion into temporary tables:

If you have a workload that uses #temp tables to stage intermediate results, and you probably do because you’re smart, it might be worth taking advantage of being able to insert into the #temp table in parallel.

Remember that you can’t insert into @table variables in parallel, unless you’re extra sneaky. Don’t start.

If your code is already using the SELECT ... INTO #some_table pattern, you’re probably already getting parallel inserts. But if you’re following the INSERT ... SELECT ... pattern, you’re probably not, and, well, that could be holding you back.

There are enough pre-conditions that this becomes a decision rather than an automatic. Especially if you’re dealing with temp tables with indexes and want to take advantage of temp table reuse, which I believe precludes changing the structure of the table (including adding indexes) after creation.

Comments closed

Understanding Decision Trees

Ram Tavva walks us through the algorithm to design decision trees:

A decision tree is made up of several nodes:

1.Root Node: A Root Node represents the entire data and the starting point of the tree. From the above example the
First Node where we are checking the first condition, whether the movie belongs to Hollywood or not that is the
Rood node from which the entire tree grows
2.Leaf Node: A Leaf Node is the end node of the tree, which can’t split into further nodes.
From the above example ‘watch movie’ and ‘Don’t watch ‘are leaf nodes.
3.Parent/Child Nodes: A Node that splits into a further node will be the parent node for the successor nodes. The
nodes which are obtained from the previous node will be child nodes for the above node.

Read on for an example of implementation in R.

Comments closed

The Effects of Undersampling and Oversampling on Predicted Probability

Bryan Shalloway has an interesting article for us:

In classification problems, under and over sampling techniques shift the distribution of predicted probabilities towards the minority class. If your problem requires accurate probabilities you will need to adjust your predictions in some way during post-processing (or at another step) to account for this.

Bryan has a clear example showing this problem in action.

Comments closed

Measuring Powershell Profile Load Times

Jeffrey Hicks shows us how to keep track of lost time:

Here’s the “issue” that often arises. Someone will mention that PowerShell, and this includes PowerShell Core, takes too long to load. In fact, PowerShell now shows you how long it took to load. Almost always, the issue is something profile related. Sometimes a command is taking too long to run, or maybe the profile needs a little cleanup. I know my PS7 load times were high until I cleaned up a few items and re-structured some of the commands.

To make this easier, I put together a simple script that you can run in Windows PowerShell, or PowerShell (even cross-platform) that will run your profile scripts and report how long they take to complete.

Click through for more details, as well as a script to test how quickly your Powershell profiles load.

Comments closed

JSON Basics with SQL Server

Steve Jones takes us through querying straightforward JSON data in SQL Server:

Recently I saw Jason Horner do a presentation on JSON at a user group meeting. I’ve lightly looked at JSON in some detail, and I decided to experiment with this.

All in all, I’ve been pretty happy with the syntax for JSON manipulation in T-SQL. I’m not the biggest user of JSON around, but when I’ve needed to slice or build JSON, even when I needed to build it in a certain way to emulate an old application, it has worked for me.

Comments closed

Reducing CTE Re-Scans with APPLY

Daniel Hutmacher shows another good use of the APPLY operator:

You can tell by the plan why this is an inefficient query: the SQL expression in the common table expression is executed once for every time that it’s referenced in the code.

Better living through CROSS APPLY

You could store the results of the CTE in a temp table, but where’s the fun in that? Instead, why not use the CTE once, and then return four rows for each row that the CTE spits out? That’s exactly what CROSS APPLY does.

Read the whole thing and appreciate that much more all the nice things you can do with APPLY.

Comments closed