Press "Enter" to skip to content

Day: February 16, 2021

Non-Equi Joins in R

David Selby walks us through non-trivial join scenarios in R:

Most joins are equi-joins, matching rows according to two columns having exactly equal values. These are easy to perfom in R using the base merge() function, the various join() functions in dplyr and the X[i] syntax of data.table.

But sometimes we need non-equi joins or θ-joins, where the matching condition is an interval or a set of inequalities. Other situations call for a rolling join, used to link records according to their proximity in a time sequence.

How do you perform non-equi joins and rolling joins in R?

Click through for the answer using dplyr, sqldf, and data.table. H/T R-bloggers

Comments closed

Synchronizing Metadata between Spark Tables and Serverless Pool

Charl Roux takes us through one back-end integration mechanism between tables in Azure Synapse Analytics Spark pools and serverless SQL pool:

Synapse provides an exciting feature which allows you to sync Spark database objects to Serverless pools and to query these objects without the Spark pool being active or running.  Synapse workspaces are accessed exclusively through an Azure AD Account and objects are created within this context in the Spark pool. In some scenarios I would like to share the data which I’ve created in my Spark database with other users for reporting or analysis purposes. This is possible with Serverless and in this article I will show you how to complete the required steps from creation of the object to successful execution. 

Click through for the demonstration.

Comments closed

Monitoring SSAS with Quest Spotlight

Slava Murygin has two questions and two answers:

This post is just answering two simple questions:

1. Can Quest Software’s Spotlight successfully monitor SQL Server Analysis Server?

2. If it can, what SSAS parameters, databases’ and cubes’ details it monitors and provides information about?

First, it’s good to see Slava back in the saddle again. Second, click through for those answers. Slava also promises to check out some other SSAS monitoring tools, so stay tuned.

Comments closed

Visualizing a Power BI Refresh

Phil Seamark has a dashboard which will help understand Power BI dataset refresh times:

Have you ever wondered why a Power BI dataset refresh was taking so long? And more specifically, how much time did the refresh spend on various sub-tasks that aren’t that visible to you via the web-portal?

This article shares a technique you can use to capture events fired during a Power BI refresh and use the results in a Power BI report visualise the results. Have to love the idea of using Power BI to optimise and improve Power BI. 

Click through to get the Power BI report and get step-by-step instructions on how to use it.

Comments closed

The Sequence Operator

Hugo Kornelis digs into the sequence operator:

The Sequence operator reads all rows from all its inputs, in order, and without modification. It returns all rows it reads from its last input only.

This operator is typically used in execution plans where some branches are needed to do preparation work (e.g. to invoke a Table-valued Function operator) and then other branches produce the final output, or multiple branches need to do work in sequence (such as in wide data modification plans). The operators on the non-last inputs often do not produce any output at all, but even if they do the rows returned are discarded. Only rows returned from the last input are returned to the parent of the Sequence operator.

Note that this is quite difference from sequences of numbers in SQL Server, which act similarly to identity integers.

Comments closed

Looping with WHILE(TRUE)

Eitan Blumin won’t be getting off this bus:

This is one of those things that I would have never expected to adopt, and yet here I am! Not just loving “infinite” loops, but also confident enough that I can convince you to feel the same!

It may sound very weird to most people at first, since it’s such an unorthodox way of using WHILE loops, and probably goes against what you consider to be coding best practices and standards.

I use this a fair amount as well for the reasons Eitan describes. The one thing to keep an eye out for, though, is that you are actually progressing toward an end goal. In other words, if you’re using WHILE(TRUE) to loop through batches of rows in a table by, say, minimum ID, make sure you increment that minimum ID or else you’ll be spinning your wheels for a while and not realize it. Not that I’ve ever done that before, of course…

Comments closed

Sorting Pre-Sorted Data

Daniel Hutmacher has an idea:

Whenever SQL Server needs to sort a data stream, it will use the Sort operator to reorder the rows of the stream. Sorting data is an expensive operation because it entails loading part or all of the data into memory and shifting that data back and forth a couple of times. The only time SQL Server doesn’t sort the data is when it already knows the data to be ordered correctly, like when it has already passed a Sort operator or it’s reading from an appropriately sorted index.

But what happens if the data is ordered correctly, but SQL Server doesn’t know about it? Let’s find out.

Click through for the answer.

Comments closed

Filtering by Cluster in Power BI

Joseph Yeates starts a two-parter:

This is a technique that I have used in reports that analyze a feature in a data set that is at the bottom of a hierarchy. For example, customer that belong to a larger customer segment or accounts that belong to a grouping. The report analyzes information for an individual customer or account; however, we want to bring in some comparisons for other customers or accounts that belong to the same segment, grouping or cluster.

Click through for the technique and stay tuned for part two.

Comments closed