Press "Enter" to skip to content

Curated SQL Posts

Synchronizing Metadata between Spark Tables and Serverless Pool

Charl Roux takes us through one back-end integration mechanism between tables in Azure Synapse Analytics Spark pools and serverless SQL pool:

Synapse provides an exciting feature which allows you to sync Spark database objects to Serverless pools and to query these objects without the Spark pool being active or running.  Synapse workspaces are accessed exclusively through an Azure AD Account and objects are created within this context in the Spark pool. In some scenarios I would like to share the data which I’ve created in my Spark database with other users for reporting or analysis purposes. This is possible with Serverless and in this article I will show you how to complete the required steps from creation of the object to successful execution. 

Click through for the demonstration.

Comments closed

Monitoring SSAS with Quest Spotlight

Slava Murygin has two questions and two answers:

This post is just answering two simple questions:

1. Can Quest Software’s Spotlight successfully monitor SQL Server Analysis Server?

2. If it can, what SSAS parameters, databases’ and cubes’ details it monitors and provides information about?

First, it’s good to see Slava back in the saddle again. Second, click through for those answers. Slava also promises to check out some other SSAS monitoring tools, so stay tuned.

Comments closed

Visualizing a Power BI Refresh

Phil Seamark has a dashboard which will help understand Power BI dataset refresh times:

Have you ever wondered why a Power BI dataset refresh was taking so long? And more specifically, how much time did the refresh spend on various sub-tasks that aren’t that visible to you via the web-portal?

This article shares a technique you can use to capture events fired during a Power BI refresh and use the results in a Power BI report visualise the results. Have to love the idea of using Power BI to optimise and improve Power BI. 

Click through to get the Power BI report and get step-by-step instructions on how to use it.

Comments closed

The Sequence Operator

Hugo Kornelis digs into the sequence operator:

The Sequence operator reads all rows from all its inputs, in order, and without modification. It returns all rows it reads from its last input only.

This operator is typically used in execution plans where some branches are needed to do preparation work (e.g. to invoke a Table-valued Function operator) and then other branches produce the final output, or multiple branches need to do work in sequence (such as in wide data modification plans). The operators on the non-last inputs often do not produce any output at all, but even if they do the rows returned are discarded. Only rows returned from the last input are returned to the parent of the Sequence operator.

Note that this is quite difference from sequences of numbers in SQL Server, which act similarly to identity integers.

Comments closed

Looping with WHILE(TRUE)

Eitan Blumin won’t be getting off this bus:

This is one of those things that I would have never expected to adopt, and yet here I am! Not just loving “infinite” loops, but also confident enough that I can convince you to feel the same!

It may sound very weird to most people at first, since it’s such an unorthodox way of using WHILE loops, and probably goes against what you consider to be coding best practices and standards.

I use this a fair amount as well for the reasons Eitan describes. The one thing to keep an eye out for, though, is that you are actually progressing toward an end goal. In other words, if you’re using WHILE(TRUE) to loop through batches of rows in a table by, say, minimum ID, make sure you increment that minimum ID or else you’ll be spinning your wheels for a while and not realize it. Not that I’ve ever done that before, of course…

Comments closed

Sorting Pre-Sorted Data

Daniel Hutmacher has an idea:

Whenever SQL Server needs to sort a data stream, it will use the Sort operator to reorder the rows of the stream. Sorting data is an expensive operation because it entails loading part or all of the data into memory and shifting that data back and forth a couple of times. The only time SQL Server doesn’t sort the data is when it already knows the data to be ordered correctly, like when it has already passed a Sort operator or it’s reading from an appropriately sorted index.

But what happens if the data is ordered correctly, but SQL Server doesn’t know about it? Let’s find out.

Click through for the answer.

Comments closed

Filtering by Cluster in Power BI

Joseph Yeates starts a two-parter:

This is a technique that I have used in reports that analyze a feature in a data set that is at the bottom of a hierarchy. For example, customer that belong to a larger customer segment or accounts that belong to a grouping. The report analyzes information for an individual customer or account; however, we want to bring in some comparisons for other customers or accounts that belong to the same segment, grouping or cluster.

Click through for the technique and stay tuned for part two.

Comments closed

Polychoric Correlation in Practice

Jack Davis explains the concept of polychoric correlation:

In polychoric correlation, we don’t need to know or specify where the boundary between “good” and “very good” is, just that it exists. The distribution of the ordinal responses, along with the assumption that the latent values follow a normal distribution, is enough that the polychor() function in the polycor R package can do that for us. In most practical cases, you don’t even need to know where the cutoffs are, but they are useful for demonstration that the method works.

Polychoric correlation estimates the correlation between such latent variables as if you actually knew what those values were. In the examples given, we start with the latent variables and use cutoffs to set them into bins, and then use polychoric on the artificially binned data. In any practical use case, the latent data would be invisible to you, and the cutoffs would be determined by whoever designed the survey.

Read on for a demonstration of the process in R.

Comments closed

Adding Data Labels in Excel

Elizabeth Ricks shows how to include data labels on a line chart in Excel:

Today’s post is a tactical one for folks creating visuals in Excel: how to embed labels for your data series in your graphs, instead of relying on default Excel legends.

To illustrate, let’s look at an example from storytelling with data: Let’s Practice!. The graph below shows demand and capacity (in project hours) over time.

It is significantly more complicated than you might first think.

Comments closed

Power Query Design the Right Way

Paul Turley continues a series:

Life is full of choices and trade-off decisions. Let’s say that you need to create a lookup table containing sales regions and this information doesn’t exist in the source database. You could easily create a new table in Power Query using the “Enter Data” feature, and just manually enter the sales regions. This would solve the immediate problem with very little effort, but how will the table be maintained if sales regions are added or change in the future? We could keep the list of values in an Excel file stored in SharePoint for a business owner to maintain when information changes. You could also go all the back back to the beginning of the process and load a dimension table in the source database. IT professionals might take the hardline and say that all data must flow through the data warehouse regardless of the effort and cost. Which of these is the right choice? …it depends on the scope reporting project, and the long-term ownership of the solution. Sometimes quick and convenient are OK but particularly tables that need to scale and handle larger data volumes in the future, the following guidelines are critical.

Read on for Paul’s recommended practices.

Comments closed