Press "Enter" to skip to content

Day: December 2, 2020

Stochastic Processes in R

David Robinson takes us through simulation of a random walk in R:

What’s fun about this problem is that it’s an example of a random walk: a stochastic process made up of a sequence of random steps (in this case, left or right). What makes this a fun variation is that it’s a random walk in a circle- passing 5 to the left is the same as passing 15 to the right. I wasn’t previously familiar with a random walk in a circle, so I approached it through simulation to learn about its properties.

Click through for a simulation. Or 50,000 of them.

Comments closed

Durable Keys in Type 2 Dimensions

Martin Schoombee takes us through the idea of durable keys:

Also called an immutable or persisted key (I like durable better), a durable key is nothing more than a surrogate key (i.e. integer value or nonsensical number) used to identify a dimension member (company, employee, etc.) uniquely in a type-2 dimension. Confusing enough? It’s easier to explain with an example…

When I read Martin’s post, I kind of got it but said to myself, “How would I run this type of query more efficiently?” The thing that wasn’t clicking came from another article on the topic: you add the durable key to the fact as well as the current key. That way, you can join back to the Company dimension on CompanyKey if you want to get the company data as of the fact date, or you can join on DurableCompanyKey (and CurrentRecord = 1) to get the latest company data regardless of the fact date. Now that this is clear, I like the strategy a lot.

1 Comment

dbachecks Against Azure SQL Databases

Jess Pomfret takes us through running dbachecks on an Azure SQL Database:

Last week I gave a presentation at Data South West on dbachecks and dbatools. One of the questions I got was whether you could run dbachecks against Azure SQL Databases, to which I had no idea. I always try to be prepared for potential questions that might come up, but I had only been thinking about on-premises environments and hadn’t even considered the cloud.  The benefit is this gives me a great topic for a blog post.

Click through for the answer.

Comments closed

Parallel Inserts into Temp Tables

Erik Darling explains the pre-conditions for parallel insertion into temporary tables:

If you have a workload that uses #temp tables to stage intermediate results, and you probably do because you’re smart, it might be worth taking advantage of being able to insert into the #temp table in parallel.

Remember that you can’t insert into @table variables in parallel, unless you’re extra sneaky. Don’t start.

If your code is already using the SELECT ... INTO #some_table pattern, you’re probably already getting parallel inserts. But if you’re following the INSERT ... SELECT ... pattern, you’re probably not, and, well, that could be holding you back.

There are enough pre-conditions that this becomes a decision rather than an automatic. Especially if you’re dealing with temp tables with indexes and want to take advantage of temp table reuse, which I believe precludes changing the structure of the table (including adding indexes) after creation.

Comments closed