Decomposing Power BI Desktop Files

Reza Rad wants to see exactly where the M scripts in a Power BI Desktop file are stored:

Talking about Power Query; DataMashup file is all you need. It includes everything from the structure of queries, tables, parameters, list, to the actual M scripts behind the scene. You can Fetch all of these information from this single file. Let’s look at the structure of this file. If you open this file with a text editor. you will see some binary things first (which are related to the zipped nature of this file), and also some XML information. Yes, this is a zipped file. Let’s start with unzipping it into a folder. I’ve done that with 7-zip application.

This is an interesting peek under the covers of a PBIX file.

Memory Grants Affecting Columnstore Load

Denzil Ribeiro explains how memory grant pressure can determine whether a columnstore bulk insert skips the deltastore or not:

We found that only at the beginning of the run, there was contention on memory grants (RESOURCE_SEMAPHORE waits), for a short period of time. After that and later into the process, we could see some latch contention on regular data pages, which we didn’t expect as each thread was supposed to insert into its own row group. You would also see this same data by querying sys.dm_exec_requests live, if you caught it within the first minute of execution, as displayed below.

This is useful in case you run into the issue, but also useful as a case study on effective troubleshooting.

Waits: External But Not Preemptive

Ewald Cress has some thoughts on external wait types:

Previously I dug into preemptive waits in SQLOS, and to be honest, I equated “preemptive” with “external”. For the most part the two go hand in hand after all.

To recap, a preemptive wait isn’t necessarily a wait at all. What happens is that a worker needs to run some code that can’t be trusted to play by cooperative scheduling rules. And rather than put the SQLOS scheduler (and all its sibling workers) at the mercy of that code, the worker detaches itself from the scheduler and cedes control to a sibling runnable worker.

Read the whole thing.

More On Adaptive Joins

Erik Darling has a couple more posts on adaptive joins in SQL Server 2017.  First, he wonders what happens when you add scalar functions to the mix:

See, this isn’t SARGable either (and no, SCHEMABINDING doesn’t change this). When a predicate isn’t SARGable, you take away an index seek as an access choice. You don’t see too many Nested Loops with an index scan on the other end, do you?


So there you go. It’s not the function itself that bops our Adaptive Join on the head, but the lack of SARGability.

Erik also looks at using APPLY and EXISTS to make sure you can use adaptive joins with that dynamic duo:

I do have to point out that Cross Apply used to only be implemented as a Nested Loops Join. I learned that many years ago from one of the best articles written about Cross Apply by Paul White. That changed recently — it’s possible to see it implemented with a Hash Join in at least 2016. I’ve seen it crop up in Cross Apply queries without a TOP operator.

The latter results are a bit surprising.

Disabling Nested Loop Join Optimization

Dmitry Pilugin explains the differences between trace flag 2340 and the DISABLE_OPTIMIZED_NESTED_LOOP query hint:

This optimization provides a great boost with a sufficient number of rows. You can read more about its test results in the blog OPTIMIZED Nested Loops Joins, created by Craig Freedman, an optimizer developer.

However, if the actual number of rows is less than the expected one, then CPU additional costs to build this sort may hide its benefits, increase CPU consumption and reduce its performance.

Read the whole thing.  I think the likelihood of using either this hint or the trace flag is near nil, but crazy things do come up.

Adaptive Join Internals

Dmitry Pilugin digs deep into how adaptive joins work in SQL Server 2017:

We have three types of physical join algorithms in SQL Server: hash, nested loops and merge. Adaptive join allows SQL Server automatically choose an actual physical algorithm on the fly between the first two – hash (HM) and nested loops (NL).

NL has two join strategiesnaive nested loops join (inner loop scans the whole inner table or index) and index nested loops join (index on the join column of the inner table is used to find necessary rows and then those rows are applied to the outer row, also called Nested Loops Apply). Typically, the second one performs very well if you have rather small input on the outer side and indexed rather big input on the inner side.

HM is more universal and uses hash algorithms to match rows, so no indexes are necessary. You may refer to my blog post Hash Join Execution Internals for more details.

Adaptive Join starts execution as a Hash Join. It consumes all the input of the build phase and looks at the adaptive join threshold, if the number of rows is more or equal this threshold it will continue as a hash join. However, if the number of rows is less than this threshold, it will switch to a NL.

If you want to get a better understanding of how adaptive joins works, Dmitry’s post is a great start.


Ewald Cress talks about SOS_UnfairMutexPair:

The focal point of the mutex’s state – in fact one might say the mutex itself – is the single Spinlock bit within the 32-bit lock member. Anybody who finds it zero, and manages to set it to one atomically, becomes the owner.

Additionally, if you express an interest in acquiring the lock, you need to increment the WaiterCount, whether or not you managed to achieve ownership at the first try. Finally, to release the lock, atomically set the spinlock to zero and decrement the WaiterCount; if the resultant WaiterCount is nonzero, wake up all waiters.

Now one hallmark of a light-footed synchronisation object is that it is very cheap to acquire in the non-contended case, and this class checks that box. If not owned, taking ownership (the method SOS_UnfairMutexPair::AcquirePair()) requires just a handful of instructions, and no looping. The synchronisation doesn’t get in the way until it is needed.

However, if the lock is currently owned, we enter a more complicated world within the SOS_UnfairMutexPair::LongWait() method.

I love the statement that “This is not a very British class at all.”  Read the whole thing.

Default Column Storage

Paul Randal explains how default column values are stored:

And selecting the initial 10 rows can be demonstrated to return the 3rd column using the initial default set in step 3. (It makes no difference if any rows are added between steps 3 and 4.)

This means that there *must* be two default values stored when a new column is added: one for the set of already-existing rows that don’t have the new column and one for any new rows. Initially these two default values will be the same, but the one for new rows can change (e.g. in steps 4 and 5 above) with breaking the old rows. This works because after the new column is added (step 3 above), it’s impossible to add any more rows that *don’t* have the new column.

And this is exactly how it works. Let’s investigate!

In typical Paul Randal fashion, this is both a look at internals and an interesting explanation.

Secret Mysteries Of SQL Server 2017

Erik Darling goes spelunking for new and modified internal objects in SQL Server 2017:


What have we here?


Of particular interest are last_sql_handle, query_hash, and query_plan_hash. It appears that we’ll finally be able to easily tie missing index requests to their queries, without doing a lot of painful XML processing. I had planned on adding something like this, but couldn’t find a good fit between 1) adding XML processing to sp_BlitzIndex, or adding more DMV queries and rather unpleasant XML processing to sp_BlitzCache. This will make implementing it far easier, assuming it works the way it looks like it will work.

Erik has three examples of interesting additions in CTP 2.0.

Trying Out Interleaved Execution

Dmitry Pilugin has a test of interleaved execution of a multi-statement table-valued function:

First of all, note that Estimated Number of Rows is 10 000 now, which is correct and equals Actual Number of Rows. Due to the correct estimate the optimizer decided that there are enough rows to benefit from a partial (local/global) aggregation and introduced a partial Hash aggregate before the join.

If you take a Profiler, enable events SP:StmtStarting, SP:StmtCompleted, SQL:StmtStarting, SQL:StmtCompleted and run the query without and with a TF, you’ll see what does it actually mean “interleaved” in terms of the execution sequence.

During the regular execution the query starts executing, then the function is executed, the query continues execution and finishes the execution. We see the following event sequence:

This is a very thorough post, but if you have multi-statement TVFs, you absolutely want to read it.


June 2017
« May