Press "Enter" to skip to content

Curated SQL Posts

Converting Odds to Probabilities with R

Jonas Christoffer Lindstrom has a new package:

Now you might think that converting decimal odds to probabilities should be easy, you can just use the definition above and take the inverse of the odds to recover the probability. But it is not that simple, since in practice using this simple formula will give you improper probabilities. They will not sum to 1, as they should, but be slightly larger. This gives the bookmakers an edge and the probabilities (which aren’t real probabilities) can not be considered fair, and so different methods for correcting this exists.

Read on to learn more about the problem and a few solutions. H/T R-Bloggers.

Comments closed

Multi-Armed Bandits

Alex Slivkins has a new book:

If you’ve ever been in a casino, you may have found yourself asking one very pertinent question: On which slot machine am I going to hit the jackpot? Standing in front of a bank of identical-looking machines, you have only instinct to go on. It isn’t until you start putting your money into these one-armed bandits, as they’re also known, that you get a sense of which are hot and which are not, and when you find one that’s paying out regularly, you might stick with it in hopes of winning big. Though seemingly specific to the Las Vegas Strip, this scenario boils down to an exploration-exploitation tradeoff: make a decision based on what you already know and miss out on a potentially bigger reward or spend time and resources continuing to gather information.

Read on for some info about the book. Near the end, Alex gives a link to where you can buy it, as well as where you can get a PDF copy for free.

Comments closed

Creating the Elastic Job Agent for Azure SQL Database

Kate Smith continues a series on Elastic Jobs in Azure SQL Database:

There is no way to create the Elastic Job Agent in T-SQL. I have already shown how to do this in PowerShell. To do this in the Azure Portal, go to Home, click the box that says “+ Create a Resource”, then search in the box for Elastic Job Agent. Select that, and then follow the steps in the portal to create the agent.

After creating the agent, Kate then shows how to set up credentials, target groups, and jobs.

Comments closed

SUMX and Duplicates

Koen Verbeeck takes us through some unexpected results with the SUMX() function:

Hold on, the result is different for the 29th than the 28th? This is what the title means with incorrect (or unexpected) results. Even though the data looks exactly the same in the table, the result are not.

It is not a bug in the DAX formula language. The problem resides with the duplicates in the table.

Read on to understand what this means and how you can change your code to prevent this issue.

Comments closed

Sort Keys and Join Types in Amazon Redshift

Derik Hammer takes us through query tuning a nasty job on Amazon Redshift:

My team built a process to load from a couple of base tables, in our Amazon Redshift enterprise data warehouse, into an other table which would act as a data mart entity. The data was rolled up and it included some derived fields. The SQL query had some complicity [complexity?, ed.] to it.

This process ran daily and was being killed by our operations team after running for 22 hours.

I stepped in to assist with performance tuning and discovered that join choices, such as INNER vs. OUTER joins have a big impact on whether Redshift can use its sort keys or not.

Click through for more details and what Derik ended up doing.

Comments closed

Snapshot Creation in Azure Data Studio

Dave Bland checks out an extension to Azure Data Studio to manage snapshots:

Like many Azure Data Studio extensions, DB Snapshot Creator is designed to bring functionality into ADS that is not present by default.  This extension was developed by Sean Price. As the name suggests, this extension can be used to easily create database snapshots.  Before going too deep into this extension, let’s take a quick moment to go over what a snapshot is.

Back in the day, I created a WPF tool for a company to manage snapshots for manual testing: take a snapshot, perform whatever destructive testing you needed to do, and revert back to a known good state. In a world with good CI/CD tooling and Docker containers, that’s not nearly as important anymore, but sometimes you just need to run a quick test, so I’m glad the functionality is still around.

Comments closed

Removing an Extra Transaction Log File

Jeff Iannucci shows how to remove an unwanted guest from your database:

True, there’s no advantage to having more than one log file, but sometimes that one file grows suddenly and fills up the drive in the middle of a transaction and you’re stuck with those dreaded “THE DATABASE IS DOWN!!!” tickets until that transaction finishes. So, in the heat of the moment, you hit the panic button and create ANOTHER log file on a different drive.

Then, minutes, hours, or even weeks later, you want to put the universe back in order by resizing the original log file and removing the extra one. But what if you find you can’t remove that extra one, no matter what you try to do?

This is a legitimate case. Hopefully you plan ahead and never hit it, but stuff happens.

Comments closed

Making a Better Pie Chart

Elizabeth Ricks tries the impossible:

A friend called me recently and started our conversation with: “I know you dislike pie charts, but…can you help me create one?” 

Spoiler alert: I don’t hate pie charts. They’ve received a bad rap over the years and with good reason—they are very commonly used when another chart type would be better suited. The appropriate use case for a pie chart is expressing a part-to-whole relationship. Their limitation is that it can be difficult to accurately judge the relative size of and compare the segments. Here are some related articles on our blog: the great pie debate and an updated post on pies

Elizabeth does put together the best possible case, but I’m still in favor of burning pie charts to the ground.

Comments closed

Hive: Shuffle Failed with Too Many Fetch Failures

Dmitry Tolpeko takes us through an ugly error:

On one of the clusters I noticed an increased rate of shuffle errors, and the restart of a job did not help, it still failed with the same error.

The error was as follows:

Error: Error while running task ( failure ) : org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: error in shuffle in Fetcher at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal (Shuffle.java:301)

Caused by: java.io.IOException: Shuffle failed with too many fetch failures and insufficient progress!failureCounts=1, pendingInputs=1, fetcherHealthy=false, reducerProgressedEnough=true, reducerStalled=true

Click through to understand what this error means and what you can do about it.

Comments closed

Distinct Counts in Power Query

Reza Rad shows how you can get a distinct count in Power Query:

You can have a distinct count calculation in multiple places in Power BI, through DAX code, using the Visual’s aggregation on a field, or even in Power Query. If you are doing the distinct count in Power Query as part of a group by operation, however, the existing distinct count is for all columns in the table, not for a particular column. In this article, I’ll show you a method you can use to get the distinct count of a particular column through the Group By transformation in Power Query component of Power BI.

Click through to learn how.

Comments closed