# Day: May 5, 2020

You are locked in the dungeon of a faraway castle with three fellow prisoners (i.e., there are four prisoners in total), each in a separate cell with no means of communication. But it just so happens that all of you are logicians (of course)….

Each prisoner will be given a fair coin, which can either be fairly flipped one time or returned to the guards without being flipped. If all flipped coins come up heads, you will all be set free! But if any of the flipped coins comes up tails, or if no one chooses to flip a coin, you will all be doomed to spend the rest of your lives in the castle’s dungeon.

The only tools you and your fellow prisoners have to aid you are random number generators, which will give each prisoner a random number, uniformly and independently chosen between zero and one.

What are your chances of being released?

I’ll solve this with tidy simulation in R, in particular using one of my favorite functions, tidyr’s `crossing()`. In an appendix, I’ll show how to get a closed form solution for N = 4.

I’ve also posted a 30-minute screencast of how I first approached the simulation and visualization.

Click through for the solution and explanation.

My solutions use a numbers table. You’re free to try replacing that aspect of them with an inlined version like Jeff Moden uses in his string splitter, but I found the numbers table approach faster. Granted, it’s also less portable, but that’s a trade-off I’m willing to make.

What I don’t like about either solution is that I have to re-assemble the string using XML PATH. If you’ve got another way to do that, I’m all ears. I know 2017 has STRING_AGG, but that didn’t turn out much better, and it wouldn’t be usable in other supported versions.

I agree that the best route here is just to suck it up and use CLR, but if you’re going to live in a T-SQL-only world, this is probably the best method available.

Just announced is Query Acceleration for Azure Data Lake Storage Gen2 (ADLS) as well as Blob Storage. This is a new capability for ADLS that enables applications and analytics frameworks to dramatically optimize data processing by retrieving only the data that they require to perform a given operation from storage. This reduces the time and processing power that is required to query stored data.

For example, if an application will execute a SELECT statement that filters columns and rows from a csv file, instead of all pulling the entire csv file over the network into the application and then filtering the data, it will instead do the filtering at the time the data is read from the disk, so that only the filtered data is transferred over the network to the application. So if you have a csv file with 50 columns and 1 million rows, but the filters limit the data to 5 columns and 1000 rows, then only the 5 columns and 1000 rows will be retrieved from the disk and sent over the network to the application.

Click through to learn more, including current libraries which support this and information on the additional cost. I’d really like to see PolyBase support this, as it would alleviate one of the problems with using Blob Storage + PolyBase: the need to pull all of that data down to your SQL Server instance before doing any filtering.

To find the culprit, we tracked down an old informant named sp_who2.  At first he had no comment, but we knew he had some information that could help us, so we kept after him.  Finally, he grew tired of us and gave us something just to make us go away. He said we needed to talk to his associate sysproccesses.  At first this guy was hard to find, but we located him sitting at a table.  He wasn’t really doing anything, just sitting there. So we approached him and sat down to see if we can get what we are looking for.  He also had no comment and referred us to his lawyer, sys.dm_exec_sessions.  After some searching, we were able to catch up with him at the DMV and he was not in a good mood.  He gave up some information, just not exactly what we are looking for.

Dave shows how we can figure out who created a specific temp table (a global temp table, in this case), the query that account used to create the temp table, and the time the temp table was created.

Given that we can specify 5 different parameters (USER MEMBERNAME, ROLE MEMBERNAME, WLM_LABEL, WLM_CONTEXT, START_TIME/END_TIME) – there must be a prioritisation mechanism in order to decide which condition gets selected. This mechanism is called Parameter Weighting in Azure Synapse and it assigns the following weight to each of those parameters:
USER = 64
ROLE = 32
WLM_LABEL = 16
WLM_CONTEXT = 8
START_TIME/END_TIME = 4
meaning that if the Workload Classifier fits into the timeframe START_TIME/END_TIME, WLM_LABEL & ROLE – it will receive 52 points = 4 + 16 + 32,
while a different Workload Classifier that fits into WLM_CONTEXT & USER will get 72 points = 8 + 64, thus will prevail and will be selected over the first Workload Classifier.

Azure Synapse Analytics (including when it was known as SQL Data Warehouse) has had some resource governor-related things I’ve wanted in the box product for a while, including labels (which are better than using application name).

So I’m doing another 31 days of posts, hopefully consecutive days but given current pandemics and sporadic working hours it may not be.

This is the index of all the posts, I’ll update it as each one is posted.

Click through for the index page, which will contain links to all of the post. So far, Stuart has 4 up.

I will be blogging on a few things I learn on my journey to SSIS expertise. This is the first one. This came about as a result of wanting to delete a few packages from ssis catalog. We do not use these packages any more, and I wanted to clean them out from the project which resides on a few servers. I looked into a few ways of doing it.

Click through for three methods, including an in-depth discussion of the third (and least obstructive).