Press "Enter" to skip to content

Month: February 2024

Contrasting RDS and Parquet in R

Colin Gillespie contrasts two file formats:

The RDS format is a binary file format, native to R. It has been part of R for many years, and provides a convenient method for saving R objects, including data sets.

The obvious question is which file format should you use for storing tabular data? RDS or parquet? For this comparison, I’m interested in the following characteristics:

  • the time required to save the file;
  • the file size;
  • the time required to load the file.

Read on for the throw-down.

Comments closed

Counting Path Elements in T-SQL

Steven Sanderson switches over to T-SQL for a bit:

Yesterday I was working on a project that required me to create a SQL query to generate a table of accounts receivables pathways. I thought it would be interesting to share the SQL code I wrote for this task. The code is as follows:

Click through for the code. I was playing a bit of code golf in my mind, seeing how I might modify things. One big example would be to replace the STUFF() + FOR XML PATH combo with STRING_AGG(), assuming the SQL Server instance is 2017 or later, or if the database is in Azure SQL DB or SQL MI. The count of a specific character is an interesting approach, and this is the first time I’ve had to wonder why there isn’t a helper function in T-SQL to do that. But considering that this is the first time I’ve had to ask that question, maybe that’s an answer in itself.

Comments closed

Purging Lots of Backup History

David Wiseman needs to clear out a significant amount of backup history:

Recently, I encountered an issue running sp_delete_backuphistory on servers that hosted a large number of databases with frequent log backup & restore operations. The clean up task hadn’t been scheduled and the history tables had grown very large over several months. The msdb databases was also hosted on a volume with limited IOPs.

Attempting to run sp_delete_backuphistory under these conditions you will likely encounter these issues:

Click through for that list of issues, as well as a way of mitigating the problem. I’ve noticed this kind of pattern appears fairly often in Microsoft-provided cleanup procedures: the code works well until you reach a certain scale, at which point it falls over. It’d be great if the original sp_delete_backuphistory performed batch deletion from the get-go, but David shows us a way to get around the issue.

Comments closed

Function App Caching of Key Vault Secrets

Koen Verbeeck runs into an odd issue:

In the PowerShell function, this application setting is retrieved as an environment variable so it can authenticate with the Graph API using the app registation. So far so good, except that the secret of this app registration expires after 1 year (some time ago you could configure an expiration date for in the future, but it seems this isn’t possible anymore). The Azure Function started crashing with a 401 (Unauthorized) error.

Read on to see what Koen tried, what eventually fixed it, and a pair of updates to the post.

Comments closed

Contrasting Three Unique Identifiers in Postgres

Laetitia Avrot shares some advice:

This month’s PGSQLPhriday event is about UUId thanks to my post calling for a fight debate on the topic.

I will answer a question a friend developer asked me: “What is the best when we need a primary key? UUID, CUID, or TSID?”

I hadn’t heard of two of these, but Laetitia provides some links to learn more about them and then offers up some advice on whether to use any of them. And the advice sounds a lot like the advice for SQL Server.

Comments closed

Connect to Azure SQL Database via Azure Entra ID Service Principal

Jaime Garcia de Alba becomes the machine:

In this guide, I am going to outline the steps on how to connect to an Azure SQL database using Entra SPN with tools such as SSMS and PowerShell. This demo covers detailed steps for using an existing user when the token is received correctly. Additionally, the steps cover creating a new user from scratch in case there are issues with the existing user.

I’ve used service principals and managed identities in the past in application code, but it wasn’t until this post that I learned you could also use them directly to connect to an instance.

Comments closed

Performance Costs of using Calculated Columns in Power BI Composite Models

Chris Webb share a warning:

I don’t have anything against the use of calculated columns in Power BI semantic models in general but you do need to be careful using them with DirectQuery mode. In particular when you have a DirectQuery connection to another Power BI semantic model – also known as a composite model on a Power BI semantic model – it’s very easy to cause serious performance problems with calculated columns. Let’s see a simple example of why this is.

Read on for Chris’s example.

Comments closed

Row Re-Ordering in Shiny Apps

Stephane Laurent does a bit of work:

The ‘RowReorder’ extension of datatables is available in the DT package. This extension allows to reorder the rows of a DT table by dragging and dropping. However, if you enable this extension in a Shiny app for a table using the server-side processing (option server=TRUE in renderDT), that won’t work: each time the rows are reordered, they will jump back to their original locations.

Read on to see what you need to do in that case, as well as an example of how to do it. H/T R-Bloggers.

Comments closed

Where the Bayesian and Frequentist Approaches Meet

Sebastian Sauer bridges the gap:

However, a disadvantage of Bayes analysis, at least at its current state, is that it has higher technical and computational demands. For beginners in particular, this may present a substantial (entry) burden. Teaching statistics, I have found that students (and many colleagues) have had difficulties installing Stan (particularly the C++ compiler needed in order to run Stan); Stan is the probabilistic programming language which many front-end Bayes engines use such as brms in R.

Thus, the installation process being not so user-friendly, a burden is placed for beginners which may prevent using Bayes methods.

In that light, this post explores the numerical simarilities of Bayes regression models and Frequentis models. The idea is to use a Frequentist regression model as a proxi for a full Bayesian analysis. The value added is the quick computation and the simple technical setup.

Click through for the conditions where you’ll find very similar results, as well as a few examples of it in action.

Comments closed