Press "Enter" to skip to content

Curated SQL Posts

Drill-Down Tables in Cube.js

Artyom Keydunov shows off drill-down tables in Cube.js:

Since the release of drill down support in version 0.19.23, you can build interfaces to let users dive deeper into visualizations and data tables. The common use case for this feature is to let users click on a spike on the chart to find out what caused it, or to inspect a particular step of the funnel — who has converted and who has not.

In this blog post, I’ll show you how to define drill downs in the data schema and build an interface to let users explore the underlying chart’s data. If you’re just starting with Cube.js, I highly recommend beginning with this Cube.js 101 tutorial and then coming back here. Also, if you have any questions, don’t hesitate to ask them in our Slack community.

Click through for the demo, as well as links to the source code and an online example.

Comments closed

Comparing Koalas to PySpark

Tori Tompkins gives us an understanding of where Koalas fits in the Spark world:

One significant difference between Spark’s implementation of Dataframes and pandas is its immutability.

With Spark dataframes, you are unable to make changes to the existing object but rather create a brand new dataframe based on the old one. Pandas dataframes, however, allow you to edit the object in place. With Koalas, whilst still spark Dataframes under the hood, have kept the mutable syntax of pandas.

It does this by introducing this concept of an ‘Internal Frame’. This holds the spark immutable dataframe and manages the mapping between the Koalas column names and Spark column names. It also manages the Koalas index names to spark column name to replicate the index functionality in pandas (covered below). It acts as a bridge between Spark and Koalas by mimicking the pandas API with Spark. This Internal Frame replicates the mutable functionality of pandas by creating copies of the internal frame but appearing to be mutable.

Read the whole thing.

Comments closed

Capturing the T-SQL of Prepared Statements

Grant Fritchey shows us how to use Extended Events to capture the T-SQL of a prepared statement:

Why would I be concerned with prepared statements? Wouldn’t sql_batch_completed and rpc_completed cover us? Well, no.

What happens when you use sp_prepare? What happens when you’re using an ORM tool that’s using prepared statements? You may see queries that look like this:

EXEC sp_execute 5, 48766;

What the heck code is that executing? Let’s find out.

Read on for the code. It’s not the type of thing you want running 24/7 against all of your prepared statements (certainly not on a busy system!), but very useful for troubleshooting when you don’t have easy access to that next level up in the application chain.

Comments closed

Comparing JSON Documents with SQL Server

Phil Factor has a new function for us to try out:

It is often necessary to compare objects, particularly when you are working on a routine that generates an object and you want to check that an improvement you’ve made is actually returned the same result. An XML or JSON document is the nearest we get to an object in SQL Server. This is probably as near as you’d want.

Although this routine is designed to compare JSON documents, and takes them as input, it can be used for comparing any results.

The style takes from the diff command in Unix.

Comments closed

How MAXDOP Works

Paul White lets us in on a secret:

The nuance to the question is in the phrase “running concurrently”. There is a limit to the number of characters one can use in a Twitter poll. My intention was for people to focus on the number of threads that could have a status of RUNNING at any one moment.

There are various definitions for that status in different DMVs, but essentially RUNNING means actively executing on a processor — not waiting for its turn on a CPU or for some other resource to become available.

My expectation was that most people would think that MAXDOP 4 would limit a parallel query to using 4 CPU cores at any one moment in time. Indeed, that was the second most popular answer to the poll.

But read on to understand why 4 isn’t the correct answer.

1 Comment

Delaying Loops in Python

Jon Fletcher gives us three methods to delay a Python loop:

In this blog post, I will run through 3 different ways to execute a delayed Python loop. In these examples, we will aim to run the loop once every minute.
To show the delay, we will print out the current datetime using the datetime module.

I think I’d be a bit concerned about all three potentially failing—option 1 can get out of sync if it takes more than 1 second to process inside the loop, option 2 might miss a minute, and option 3 can also sometimes miss. But it’s interesting to think about.

Comments closed

Automating Columnstore Index Partition Rebuilds

Brett Powell has a procedure for us:

This post provides an example of a stored procedure which A) identifies the last two partitions of an Azure Synapse Analytics SQL pool table (which uses the columnstore index (default)) and B) rebuilds the index for these two partitions. Additionally, a sample PowerShell-based Azure Automation runbook is included for scheduling the execution of this procedure.

This post follows up on the previous post regarding a Power BI template to be used to analyze the health or quality of a columnstore index. For example, the template shared may help you find that the last one or two partitions such as partition numbers 39 and 40 out of 40 partitions may have many open (uncompressed) and/or not-optimized rowgroups. The cause of these low quality partitions could be that recent and ongoing data processing events are impacting these partitions (inserts,updates). Perhaps partitions 39 and 40 refer to the current and prior month for example.

Read on for the link to the script, as well as details on how to use it.

Comments closed

Traits in Scala

Sanjana Aggarwal explains the notion of Traits in Scala:

Traits are a fundamental unit of code reuse in Scala. Trait encapsulates method and field definitions, which can be reused by mixing into classes.

Two most important concept about Traits are :-

– Widening from thin interface to rich interface
– Defining stackable modifications.

There are some rather powerful things you can do with traits in Scala, though it’s important to be careful not to overdo it.

Comments closed

Free Data Sets

Chris Taylor puts on the Santa Claus getup:

This is just a short overview page of various dataset sources I’ve used in the past for usage in my SQL Server, Azure, AWS and Power BI demo’s. Most if not all of these are free as I don’t like paying for stuff and there are some whereby you can generate your own data if there is something more specific you need……..just be careful as although its “randomly” generated, I have had issues where I pushed the files to github and as it happens, one of the email addresses randomly generated was that of an employee of a company which I’d never heard of. Pure chance (I believe) but something to be mindful of if that is your intention for use.

Chris also has sub-links to Google and Kaggle data sets.

Comments closed