Press "Enter" to skip to content

Curated SQL Posts

Solving the Prisoner Coin Flipping Puzzle with R

David Robinson takes us through another problem-solving challenge:

You are locked in the dungeon of a faraway castle with three fellow prisoners (i.e., there are four prisoners in total), each in a separate cell with no means of communication. But it just so happens that all of you are logicians (of course)….

Each prisoner will be given a fair coin, which can either be fairly flipped one time or returned to the guards without being flipped. If all flipped coins come up heads, you will all be set free! But if any of the flipped coins comes up tails, or if no one chooses to flip a coin, you will all be doomed to spend the rest of your lives in the castle’s dungeon.

The only tools you and your fellow prisoners have to aid you are random number generators, which will give each prisoner a random number, uniformly and independently chosen between zero and one.

What are your chances of being released?

I’ll solve this with tidy simulation in R, in particular using one of my favorite functions, tidyr’s crossing(). In an appendix, I’ll show how to get a closed form solution for N = 4.

I’ve also posted a 30-minute screencast of how I first approached the simulation and visualization.

Click through for the solution and explanation.

Comments closed

Extracting Letters or Numbers from a String

Erik Darling has a T-SQL solution (using tally tables) for extracting letters or numbers from a string:

My solutions use a numbers table. You’re free to try replacing that aspect of them with an inlined version like Jeff Moden uses in his string splitter, but I found the numbers table approach faster. Granted, it’s also less portable, but that’s a trade-off I’m willing to make.

What I don’t like about either solution is that I have to re-assemble the string using XML PATH. If you’ve got another way to do that, I’m all ears. I know 2017 has STRING_AGG, but that didn’t turn out much better, and it wouldn’t be usable in other supported versions.

I agree that the best route here is just to suck it up and use CLR, but if you’re going to live in a T-SQL-only world, this is probably the best method available.

Comments closed

Query Acceleration for Blob Storage and Data Lake Gen2

James Serra takes us through Query Acceleration for Azure Blob Storage and Azure Data Lake Storage Gen2:

Just announced is Query Acceleration for Azure Data Lake Storage Gen2 (ADLS) as well as Blob Storage. This is a new capability for ADLS that enables applications and analytics frameworks to dramatically optimize data processing by retrieving only the data that they require to perform a given operation from storage. This reduces the time and processing power that is required to query stored data.

For example, if an application will execute a SELECT statement that filters columns and rows from a csv file, instead of all pulling the entire csv file over the network into the application and then filtering the data, it will instead do the filtering at the time the data is read from the disk, so that only the filtered data is transferred over the network to the application. So if you have a csv file with 50 columns and 1 million rows, but the filters limit the data to 5 columns and 1000 rows, then only the 5 columns and 1000 rows will be retrieved from the disk and sent over the network to the application.

Click through to learn more, including current libraries which support this and information on the additional cost. I’d really like to see PolyBase support this, as it would alleviate one of the problems with using Blob Storage + PolyBase: the need to pull all of that data down to your SQL Server instance before doing any filtering.

Comments closed

Queries Using tempdb: a Whodunit

Dave Bland shares a database detective story with us:

To find the culprit, we tracked down an old informant named sp_who2.  At first he had no comment, but we knew he had some information that could help us, so we kept after him.  Finally, he grew tired of us and gave us something just to make us go away. He said we needed to talk to his associate sysproccesses.  At first this guy was hard to find, but we located him sitting at a table.  He wasn’t really doing anything, just sitting there. So we approached him and sat down to see if we can get what we are looking for.  He also had no comment and referred us to his lawyer, sys.dm_exec_sessions.  After some searching, we were able to catch up with him at the DMV and he was not in a good mood.  He gave up some information, just not exactly what we are looking for.

Dave shows how we can figure out who created a specific temp table (a global temp table, in this case), the query that account used to create the temp table, and the time the temp table was created.

Comments closed

Workload Classification with Resource Governor in Azure Synapse Analytics

Niko Neugebauer keys in on an interesting addition to Azure Synapse Analytics:

Given that we can specify 5 different parameters (USER MEMBERNAME, ROLE MEMBERNAME, WLM_LABEL, WLM_CONTEXT, START_TIME/END_TIME) – there must be a prioritisation mechanism in order to decide which condition gets selected. This mechanism is called Parameter Weighting in Azure Synapse and it assigns the following weight to each of those parameters:
USER = 64
ROLE = 32
WLM_LABEL = 16
WLM_CONTEXT = 8
START_TIME/END_TIME = 4
meaning that if the Workload Classifier fits into the timeframe START_TIME/END_TIME, WLM_LABEL & ROLE – it will receive 52 points = 4 + 16 + 32,
while a different Workload Classifier that fits into WLM_CONTEXT & USER will get 72 points = 8 + 64, thus will prevail and will be selected over the first Workload Classifier.

Azure Synapse Analytics (including when it was known as SQL Data Warehouse) has had some resource governor-related things I’ve wanted in the box product for a while, including labels (which are better than using application name).

Comments closed

Deleting Packages from the SSIS Catalog

Mala Mahadevan performs important cleanup work:

I will be blogging on a few things I learn on my journey to SSIS expertise. This is the first one. This came about as a result of wanting to delete a few packages from ssis catalog. We do not use these packages any more, and I wanted to clean them out from the project which resides on a few servers. I looked into a few ways of doing it.

Click through for three methods, including an in-depth discussion of the third (and least obstructive).

Comments closed

IS DISTINCT FROM with Snowflake

Koen Verbeeck shows us a good operator in Snowflake:

This single expression both checks for the equality of its members, but also checks the nullability of both columns. Awesome. A good habit would be to use IS [NOT] DISTINCT FROM instead of every = or <> in every expression (join clauses, WHERE clauses etc.) and you’ll never get burned by those pesky NULLs again!

Koen has a link to a Microsoft feedback item to add this syntax to SQL Server. But that item’s been there for more than a decade, so I would not hold my breath waiting for it to show up.

Comments closed

The Limitations of Metadata-Only Updates

Eitan Blumin does not like the limitations of metadata-only column changes with SQL Server 2016:

This is an excellent mechanism on the one hand…

However, it’s completely useless when the column you want to change has a CLUSTERED INDEX defined on it (regardless of whether it’s also a PRIMARY KEY or not).

Such a scenario would especially be common with IDENTITY columns (which, ironically, is exactly the kind of examples that Paul presented in his post).

Click through to understand the scope of this limitation.

Comments closed