Press "Enter" to skip to content

Curated SQL Posts

Multi-Parent Hierarchies in Power BI

Imke Feldmann has another way to solve the multi-parent problem in Power BI:

If you have parent-child-hierarchy with multiple parents, my function will a table like below, where the children with multiple parents still reside in different rows:

Due to this, the table cannot directly be connected with the FactTable, as NodeKey is not unique. Solution is to create DimNode-table that contains only unique values from the NodeKeys. Use it as a bridge between the 2 tables and implement a bidirectional filter to the Nodes-table:

Read on for the complete answer and to grab a copy of the PBIX file.

Comments closed

A Stored Procedure to Check for Agent Job Completion

Brian Hansen has a stored procedure which can help you synchronize those asynchronous SQL Agent job calls:

This is a stored procedure that I have found useful in a number of circumstances. It came into being because there are times that I need to run a job via T-SQL and wait for it to complete before continuing the script. The problem is that sp_start_job does just what it says: it starts the job and then immediately continues. There is no option to wait for it to finish.

I needed some way to pause script execution pending the completion (hopefully successfully) of the job.

One important note: this procedure using the undocumented xp_sqlagent_enum_jobs system procedure. While this has been around for ages, it is unsupported. I get why may bother some, but this procedure is the only way that I know of to reliably determine the current run status of a job.

Read on to learn more about the procedure and grab a copy.

Comments closed

Power BI Column Concatenation

Alexander Arvidsson shows how we can concatenate columns in Power BI using DAX:

Finally we can tackle the last hurdle – the column that shows both the current number of certifications and the requested goal. Had this been Excel it would have been dead easy – we just create a new cell that concatenates two other cells like this:

Then we copy the formula to all the rows. Easy. But this is not Excel. The “goal” part is simple – that’s just another column. The trick is to count all the other rows in the table with the same key. Let’s add the key column to the table so we see what we’re working with. “CompKey” is the concatenated key we created in the previous blog post. “Number of certs” is a count of the rows in the table, and because of row context it gets evaluated per key.

Read on for the solution.

Comments closed

Azure Data Studio October 2019 Release

Alan Yu announces the October 2019 release of Azure Data Studio:

While fixing a bug involving copying rows and columns from the results grid, we ended up creating an innovative copy/paste experience with the results grid.

Typically, when you select random cells individually, pasting this into a grid like in Excel will prevent the pasting. However, we’ve implemented a logical pattern that would support this type of pasting. This works by ignoring columns and rows that are not selected, and nulling columns and rows that were not included in the copy parameters.

I really like that copy-paste functionality. Time to go update…

Comments closed

Multiple Hypothesis Testing with R

Roland Stevenson shows how we can perform multiple hypothesis tests on data, as well as potential issues:

Both results show that evaluating two tests on the same family of data will lead to a ~10% chance that a researcher will claim a “significant” result if they look for either test to reject the null. Any claim there is a maximum 5% false positive rate would be mistaken. As an exercise, verify that doing the same on \(m=4\) tests will lead to an ~18% chance!

A bad testing platform would be one that claims a maximum 5% false positive rate when any one of multiple tests on the same family of data show significance at the 5% level. Clearly, if a researcher is going to claim that the FWER is no more than \(\alpha\), then they must control for the FWER and carefully consider how individual tests reject the null.

This is worth taking some time to read carefully. H/T R-Bloggers

Comments closed

Migrating Databricks Workspaces

Gerhard Brueckl has made DatabricksPS better:

I do not know what is/was the problem here but I did not have time to investigate but instead needed to come up with a proper solution in time. So I had a look what needs to be done for a manual export. Basically there are 5 types of content within a Databricks workspace:

– Workspace items (notebooks and folders)
– Clusters
– Jobs
– Secrets
– Security (users and groups)

For all of them an appropriate REST API is provided by Databricks to manage and also exports and imports. This was fantastic news for me as I knew I could use my existing PowerShell module DatabricksPS to do all the stuff without having to re-invent the wheel again.

I’ve used DatabricksPS and really like it for cases where I’d have to loop with the Databricks REST API—for example, when uploading files.

Comments closed

Backing Up Cosmos DB

Josh Smith takes us through backing up Cosmos DB yourself:

Unfortunately if you are restricting access to your Cosmos DB service based on IP address (a reasonable security measure) then Data Factory won’t work as of this writing as Azure Data Factory doesn’t operate like a trusted Azure service and presents as IP address from somewhere in the data center where it is spun up. Thankfully they are working on this. In the meantime however the next best thing is to use the Cosmos DB migration tool (scripts below) to dump the contents to a location where they can be retained as long as needed. Be aware in addition to the RU cost of returning the data that if you bring these backups back out of the data center where the Cosmos DB lives you’ll also incur egress charges on the data.

Having a plan for this kind of thing is important, even if you normally rely on service-provided automated backups.

Comments closed

Building an Azure Usage Report with Powershell

June Castillote shows us how we can use Powershell to get usage data from Azure for our subscriptions:

In the section above, it would be common for the command to return many thousand objects especially for long date ranges. To prevent overwhelming the API, the Get-UsageAggregates command only returns a maximum of 1000 results. If you’ve saved the $usageData variable as covered in the previous section, you can confirm it by using running this command $usageData.UsageAggregations.count.

What if there are more than 1000 results? You’re going to have to do a little more work.

Knowing how much you’re spending is critical in an Op-X world like Azure or AWS.

Comments closed

Generating Anonymous Data

Daniel Hutmacher has a nice web API to generate fake customer data:

I’ve been working on a little gadget for a while now, and today I finally got around to completing it and so now I’ve published it for everyone to try out. It’s a web API (wait, wait, don’t go away – it’s for database people!) that creates a randomized list of names, addresses, etc.

In this post, I’ll show you how easy it is to use this service to anonymize a development or test database so you don’t have all that personally identifiable information floating around.

Read the whole thing and check out his service. Also, Daniel was the one who spurred me on to update the theme here to get rid of some problems, so you can thank him for that too.

Comments closed