Press "Enter" to skip to content

Category: Internals

Diving Into Spark’s Cost-Based Optimizer

Ron Hu, et al, explain how Spark’s cost-based optimizer works:

At its core, Spark’s Catalyst optimizer is a general library for representing query plans as trees and sequentially applying a number of optimization rules to manipulate them. A majority of these optimization rules are based on heuristics, i.e., they only account for a query’s structure and ignore the properties of the data being processed, which severely limits their applicability. Let us demonstrate this with a simple example. Consider a query shown below that filters a table t1 of size 500GB and joins the output with another table t2of size 20GB. Spark implements this query using a hash join by choosing the smaller join relation as the build side (to build a hash table) and the larger relation as the probe side 1. Given that t2 is smaller than t1, Apache Spark 2.1 would choose the right side as the build side without factoring in the effect of the filter operator (which in this case filters out the majority of t1‘s records). Choosing the incorrect side as the build side often forces the system to give up on a fast hash join and turn to sort-merge join due to memory constraints.

Click through for a very interesting look at this query optimzier.

Comments closed

Searching In Windbg

Ewald Cress shows us how to search for a four-byte pattern in the Windows debugger:

Cracking open Windbg on 2016 SP1 with the s command to look for byte patterns yielded nothing of value. Maybe something has changed with conventions or indirection? Nope, no joy in 2014 either.

In the end, it took the extremely brave step of RTFM, in this case the Windbg online help, to realise where I was going wrong. I was searching for a four-byte pattern by searching for doublewords. Sounds reasonable on the face of it, but what I had missed was that this specifically required the doublewords to be doubleword-aligned, i.e. starting on an address divisible by four. My method only had a 25% chance of working, so it’s sheer luck I ever got good results with it.

Changing to a byte search for four consecutive bytes gave me the non-aligned semantics my taste buds craved, and the results came pouring in.

This is in the context of gathering information on an uncommon wait type related to columnstore indexes.

Comments closed

R Services Packet Captures

Niels Berglund continues his R Services internals series:

In Figure 15, I set the filter to be tcp.srcport==50755, and then I applied the filter by clicking the arrow. To start using this:

  • Clear the Process Monitor display, and make sure you are capturing events.
  • Start WireShark capturing (Ctrl+E). If you get a question whether you want to save the captured packets, just click “Continue without Saving”.
  • Execute the code in Code Snippet 3.

The Process Monitor output looks almost the same as in Figure 9, whereas the WireShark output looks like so:

Niels also includes a recap to help people who haven’t been following along get up to speed.

Comments closed

Investigating The OS Workers DMV

Ewald Cress continues his DMV internals series:

wait_started_ms_ticks is set in SOS_Task::PreWait(), i.e. just before actually suspending, and again cleared in SOS_Task::PostWait(). For more about the choreography of suspending, see here.

wait_resumed_ms_ticks is set in SOS_Scheduler::PrepareWorkerForResume(), itself called by the mysteriously named but highly popular SOS_Scheduler::ResumeNoCuzz().

start_quantum is set for the Resuming and InstantResuming case within SOS_Scheduler::TaskTransition(), called by SOS_Scheduler::Switch() as the worker is woken up after a wait.

Ewald intends this post as an extension of the official documentation, so it’s best to read that documentation in conjunction with this post.

Comments closed

R Services Internal Communication Mechanisms

Niels Berglund continues his R Services internals series:

When browsing for the symbols, you can use this command: x /1 *!TCP*. By using the option /1 you’ll only see the names, and no addresses. On my machine that gives me quite a lot, but there are two entries that catch my eye: sqllang!Tcp::AcceptConnection and sqllang!Tcp::Close. So let us set breakpoints at those two symbols, and see what happens when we execute our code.

The result when executing the code is that we initially break at sqllang!Tcp::AcceptConnection. Followed somewhat later by breaking at sqllang!Tcp::Close. Cool, this seems to work – let us set some more breakpoints and try to figure out the flow of events.

The first half recapitulates his previous findings, and then he incorporates new information in the second half.

Comments closed

OS Threads DMV

Ewald Cress moves up the internals stack a little further and looks at a DMV:

Broadly speaking, a DMV presents just another iterator that can be plugged into a query plan. The execution engine calls GetRow() repeatedly until it reaches the end, and the iterator emits rows. The only unusual thing is that the ultimate source of the data may have nothing to do with the storage engine.

Now if you asked me to guess where in the world we’d find a list of all threads to iterate over, I would have expected that we’d start with the NodeManager, iterating over all SOS_Nodes, and then for each of them iterating over its collection of associated SystemThreads. After all, we have a guaranteed 1:1 correspondence between threads and SystemThreads, and I figured that all SystemThreads enlist themselves into a parent SOS_Node upon creation. No-brainer, right?

Turns out that this guess would have been completely wrong, and the reason it would have been a broken implementation will become apparent when we look at the started_by_sqlservr column.

Definitely read the whole thing, but you may need to top off your caffeinated beverage of choice first.  Also, I’m pretty sure Ewald promised to hammer one out once per week; that’s how I read it, at least…

Comments closed

Context Switches In SQL Server

Ewald Cress continues his journey to the center of the SQLOS:

The SQLOS scheduler exists in the cracks between user tasks. As we’re well aware, in order for scheduling to happen at all, it is necessary for tasks to run scheduler-friendly code every now and again. In practice this means either calling methods which have the side effect of checking your quantum mileage and yielding if needed, or explicitly yielding yourself when the guilt gets too much.

Now from the viewpoint of the user task, the experience of yielding is no different than the experience of calling any long-running CPU-intensive function: You call a function and it eventually returns. The real difference is that the CPU burned between the call and its return was spent on one or more other threads, while the current thread went lifeless for a bit. But you don’t know that, because you were asleep at the time!

Definitely read the whole thing.

Comments closed

R Services Internals

Niels Berglund has an excellent series on R Services internals.  Here’s the latest post:

This post is the ninth post about Microsoft SQL Server R Services, and the eight post that drills down into the internal of how it works.

So far in this series we have been looking at what happens in SQL Server as well as the launchpad service when we execute sp_execute_external_script, and we have still no real “clue” to where the R engine comes into play.

Well, hopefully that will change (at least a little bit) with this post, as we here will look at what happens when we leave the launchpad service.

This series is like candy to me.  It’s the best write-up I’ve seen so far about what’s really happening when you run SQL Server R Services.

Comments closed

SQLOS Task Origins

Ewald Cress explains how SQLOS tasks come into being:

The above system tasks wear their purpose on their sleeves, because the function pointer in the SOS_Task::Param is, well, to the point. The tasks that run user queries are more abstract, because the I/O completion port listener can’t be bothered with understanding much beyond the rudiments of reading network packets – it certainly can’t be mucking about with fluent TDS skills, SQL parsing and compilation, or permission checks.

So what it does is to enqueue a task that speaks TDS well, pointing it to a bunch of bytes, and sending it on its way. Here is an example of such a dispatch, which shows the “input” side of the WorkDispatcher:

Read the whole thing.

Comments closed

Decomposing Power BI Desktop Files

Reza Rad wants to see exactly where the M scripts in a Power BI Desktop file are stored:

Talking about Power Query; DataMashup file is all you need. It includes everything from the structure of queries, tables, parameters, list, to the actual M scripts behind the scene. You can Fetch all of these information from this single file. Let’s look at the structure of this file. If you open this file with a text editor. you will see some binary things first (which are related to the zipped nature of this file), and also some XML information. Yes, this is a zipped file. Let’s start with unzipping it into a folder. I’ve done that with 7-zip application.

This is an interesting peek under the covers of a PBIX file.

Comments closed