Press "Enter" to skip to content

Curated SQL Posts

Finally Blocks and Error Handling in Data Factory

Chen Hirsh doesn’t let failure get in the way of doing work:

Today I stumbled upon a weird behavior in Azure Data Factory (ADF) error handling.

ADF lets us add error handling in the flow control, In this example, I’m trying to copy some data, and if that fails go to on failure branch (red line). If the activity succeeded, go to on success branch (green line)

These work great (If you can call a failure great…).

Let’s take another step. What if I want to run another activity at the end, no matter if the copy succeeded or failed?

The behavior is a bit weird, as it doesn’t work quite the way you’d expect. Chen, however, shows us how to do it.

Comments closed

New Features in data.table

John MacKintosh takes us through what’s new in R’s data.table package:

The newest version of data.table has hit CRAN, and there are lots of great new features.

Among them, a %notin% function, a new let function that can be used instead of := ( I wasn’t too fussed about this originally but have tried it a few times today and I may well adopt it – although I do like that := really stands out in my code when assigning / updating variables).

Read on for the big changes. H/T R-Bloggers.

Comments closed

Exploring the Area under the ROC Curve

Aayush Srivastava takes us through one of the classics of classification:

In the realm of machine learning classification, model evaluation is an essential step to assess the performance and effectiveness of various algorithms. One widely-used tool for this purpose is the Area Under the Receiver Operating Characteristic Curve (AUC-ROC curve). In this blog, we will delve into the significance of the AUC-ROC curve, how it is calculated, and why it is an invaluable metric for evaluating classification models.

In this article, we will discuss the performance metrics used in the classification and also explore the implications of using two, namely AUC and ROC. Here is an overview of the important points that we will discuss in the article. 

The fun anecdote around ROC curves is that their name actually makes sense if you know the origin: it came out of the British army in World War II, where they tracked how their radar operators classified blips as German aircraft or noise (e.g., flocks of birds). The radar receiver operators had certain characteristics, where some were more effective at separating actual threats from noise, hence the Receiver Operating Characteristic curve.

Comments closed

Monitoring if an Azure Server Goes Offline

Paul Bergson builds an alert:

My miniature schnauzer, Raven, is a smart and lively dog who loves to hunt for rodents in the yard. She has a keen sense of smell and can detect the slightest movement of her prey. She barks loudly to alert me whenever she finds a potential target and chases after it with all her speed. However, the rodents are too cunning and often escape to a tree or a hole in the ground before she can catch them. She then returns to me with a disappointed look on her face, hoping for a treat or a pat on the head.

Azure Monitor is like Raven, but much more efficient and reliable. It can monitor your Azure servers and detect when they go offline in ~1 minute. It can also alert you via email, SMS, or webhook when something goes wrong, so you can take action to fix it. With Azure Monitor, you can stay on top of your server’s health and performance.

Read on to see how you can use Azure Monitor and build policies, with much less cleanup requirement than a dog.

Comments closed

Contrasting RDS and Parquet in R

Colin Gillespie contrasts two file formats:

The RDS format is a binary file format, native to R. It has been part of R for many years, and provides a convenient method for saving R objects, including data sets.

The obvious question is which file format should you use for storing tabular data? RDS or parquet? For this comparison, I’m interested in the following characteristics:

  • the time required to save the file;
  • the file size;
  • the time required to load the file.

Read on for the throw-down.

Comments closed

Counting Path Elements in T-SQL

Steven Sanderson switches over to T-SQL for a bit:

Yesterday I was working on a project that required me to create a SQL query to generate a table of accounts receivables pathways. I thought it would be interesting to share the SQL code I wrote for this task. The code is as follows:

Click through for the code. I was playing a bit of code golf in my mind, seeing how I might modify things. One big example would be to replace the STUFF() + FOR XML PATH combo with STRING_AGG(), assuming the SQL Server instance is 2017 or later, or if the database is in Azure SQL DB or SQL MI. The count of a specific character is an interesting approach, and this is the first time I’ve had to wonder why there isn’t a helper function in T-SQL to do that. But considering that this is the first time I’ve had to ask that question, maybe that’s an answer in itself.

Comments closed

Purging Lots of Backup History

David Wiseman needs to clear out a significant amount of backup history:

Recently, I encountered an issue running sp_delete_backuphistory on servers that hosted a large number of databases with frequent log backup & restore operations. The clean up task hadn’t been scheduled and the history tables had grown very large over several months. The msdb databases was also hosted on a volume with limited IOPs.

Attempting to run sp_delete_backuphistory under these conditions you will likely encounter these issues:

Click through for that list of issues, as well as a way of mitigating the problem. I’ve noticed this kind of pattern appears fairly often in Microsoft-provided cleanup procedures: the code works well until you reach a certain scale, at which point it falls over. It’d be great if the original sp_delete_backuphistory performed batch deletion from the get-go, but David shows us a way to get around the issue.

Comments closed

Function App Caching of Key Vault Secrets

Koen Verbeeck runs into an odd issue:

In the PowerShell function, this application setting is retrieved as an environment variable so it can authenticate with the Graph API using the app registation. So far so good, except that the secret of this app registration expires after 1 year (some time ago you could configure an expiration date for in the future, but it seems this isn’t possible anymore). The Azure Function started crashing with a 401 (Unauthorized) error.

Read on to see what Koen tried, what eventually fixed it, and a pair of updates to the post.

Comments closed