Press "Enter" to skip to content

Curated SQL Posts

Biml Notes

Bill Fellows is going through Biml Hero training, and he has some notes from day 1:

Topological sorting

This was an in-depth Extension method but as with any good recursive algorithm it was precious few lines of code. Why I care about it is twofold: execution dependencies and as I type that, I realize lineage tracing would also fall under this, and foreign key traversal. For the former, in my world, I find I have the best success when my SSIS packages are tightly focused on a task and I use a master/parent package to handle the coordination and scheduling of sub-package execution. One could use an extension method to discover all the packages that implement an Execute Package Task and then figure out the ordering of dependent tasks. That could save me some documentation headaches.

Sounds like a fun training.

Comments closed

Understanding Bookmakers’ Odds Using R

Andrew Collier looks at odds, vigs, and other bookmaking concepts through the lens of the R programming language:

The house edge is 2.70%. On average a gambler would lose 2.7% of his stake per game. Of course, on any one game he would either win or lose, but this is the long term expectation. Another way of looking at this is to say that the Return To Player (RTP) is 97.3%, which means that on average a gambler would get back 97.3% of his stake on every game.

Below are the results of a simulation of 100 gamblers betting on even numbers. Each starts with an initial capital of 100. The red line represents the average for the cohort. After 1000 games two gamblers have lost all of their money. Of the remaining 98 players, only 24 have made money while the rest have lost some portion of their initial capital.

This is a very interesting article if you’re interested in basic statistics.  13-year-old Onion article of note.

Comments closed

Azure SQL Database Versus SQL Server

Kenneth Fisher learns about differences between Microsoft’s Azure SQL Database and their on-premises (or IaaS) SQL Server:

T-SQL Differences in Azure SQL Database

I used to think this was the real difference between SQL Server and SQL Database. I was wrong. Really wrong. But it’s a good place to start. Now from what I can tell everything in Azure is a moving target. There are constant changes so it’s important to know where the documentation is. In this particular case here it is: Azure SQL Database Transact-SQL differences.

Check it out.  The differences are smaller than in the past, but I expect that there will always be some differences—particularly on the administration side—due to the nature of Azure SQL Database as a PaaS offering.

Comments closed

Power BI Calendar Visualization

Devin Knight continues his Power BI visualization series and looks at a custom calendar visual:

  • Allows you to visualize a data point on each date on the calendar.

    • The darker the color, the higher the value or density of values.
  • If you have multiple rows on the same date they are aggregated together

  • The Calendar Visualization can be used for cross filtering. Meaning you can select a square in the calendar and it will filter other visuals down to the date you picked.

This is an interesting visual.  It’s dense, but not difficult to understand.

Comments closed

Getting Pagination Wrong

Lukas Eder discusses common pagination issues:

If your data source is a SQL database, you might have implemented pagination by using LIMIT .. OFFSET, or OFFSET .. FETCH or some ROWNUM / ROW_NUMBER() filtering (see the jOOQ manual for some syntax comparisons across RDBMS). OFFSET is the right tool to jump to page 317, but remember, no one really wants to jump to that page, and besides, OFFSET just skips a fixed number of rows. If there are new rows in the system between the time page number 316 is displayed to a user and when the user skips to page number 317, the rows will shift, because the offsets will shift. No one wants that either, when they click on “next”.

Instead, you should be using what we refer to as “keyset pagination” (as opposed to “offset pagination”).

He also has a good explanation of the seek method.

I will throw in one jab at Oracle (because hey, it’s been a while since I’ve lobbed a bomb at Oracle on this blog):  it’d really suck to have a system where I legally wasn’t allowed to distribute relevant performance comparison benchmarks.  Fortunately, I tend to work on better data stacks.

Comments closed

Extended Events Audit

Steve Jones creates an audit with Extended Events:

The third part of the invitation was to write this. I covered what I did, and some of what I learned. I’ll add a bit more here.

I certainly was clumsy working with XE, and despite working my way through the course, I realize I have a lot of learning to do in order to become more familiar with how to use XE. While I got a basic session going, depending on when I started it and what I was experimenting with, I sometimes found myself with events that never went away, such as a commit or rollback with no corresponding opening transaction.

This T-SQL Tuesday was a bit broader in scope, so it has been interesting watching people respond.

Comments closed

Trapping HTTP Error Codes In Power BI

Chris Webb shows how to handle specific HTTP error codes when using the Web.Contents() function in M:

This thread on the Power Query forum suggests it’s something to do with lazy evaluation, but I haven’t been able to determine the situations when it does work and when it doesn’t.

Instead, it is possible to handle specific HTTP error codes using the ManualStatusHandling option in Web.Contents()

I guess this beats not being able to handle errors at all, but it seems like a fairly fragile solution if you next want to start handling the entire 500 class of response codes.

Comments closed

Unit Testing Of Spark Streaming

Felipe Fernandez shows how to unit test Spark Streaming:

Controlling the lifecycle of Spark can be cumbersome and tedious. Fortunately, Spark Testing Baseproject offers us Scala Traits that handle those low-level details for us. Streaming has an extra bit of complexity as we need to produce data for ingestion in a timely way. At the same time, Spark internal clock needs to tick in a controlled way if we want to test timed operations as sliding windows.

This is part one of a series.  I’m interesting in seeing where this goes.

Comments closed

Securing Spark Shuffle

Cheng Xu uses Apache Commons Crypto to secure data when Spark shuffles off to disk:

The basic steps can be described as follows:

  1. When a Spark job starts, it will generate encryption keys and store them in the current user’s credentials, which are shared with all executors.

  2. When shuffle happens, the shuffle writer will first compress the plaintext if compression is enabled. Spark will use the randomly generated Initial Vector (IV) and keys obtained from the credentials to encrypt the plaintext by using CryptoOutputStream from Crypto.

  3. CryptoOutputStream will encrypt the shuffle data and write it to the disk as it arrives. The first 16 bytes of the encrypted output file are preserved to store the initial vector.

  4. For the read path, the first 16 bytes are used to initialize the IV, which is provided to CryptoInputStreamalong with the user’s credentials. The decrypted data is then provided to Spark’s shuffle mechanism for further processing.

Once you have things optimized, the performance hit is surprisingly small.

Comments closed

AutoRestart SSAS Extended Events

Bill Anton looks at the AutoRestart option on Extended Events for Analysis Services:

So how do we handle the scenario where the server is rebooted?

  • Option 1: always remember to restart the trace after server reboots
  • Option 2: create a SQL Agent job to poll for the SSAS service status and start the xEvent trace if its not already running
  • Option 3: write a custom .NET watchdog service to poll for the SSAS service status and start the xEvents trace if its not already running

Those are the options I’ve used or seen used in the past… and to be sure, all of them have their drawbacks in reliability and/or complexity.

…which is why I was so excited when it was brought to my attention that there is an “AutoRestart” option for SSAS xEvents!

Do read the whole thing.

Comments closed