Press "Enter" to skip to content

Day: September 28, 2023

Faceted Images in ggplot2

Steven Sanderson shows multiple plots on one image:

Data visualization is a crucial tool in the data scientist’s toolkit. It allows us to explore and communicate complex patterns and insights effectively. In the world of R programming, one of the most powerful and versatile packages for data visualization is ggplot2. Among its many features, ggplot2 offers the facet_grid() function, which enables you to create multiple plots arranged in a grid, making it easier to visualize different groups of data simultaneously.

In this blog post, we’ll dive into the fascinating world of facet_grid() using a practical example. We’ll generate some synthetic data, split it into multiple groups, and then use facet_grid() to create a visually appealing grid of plots.

Read on for the demo script. The text talks about facet_grid() and the demo is facet_wrap(). The two behave very similarly, though they have slightly different use cases.

Comments closed

Building a Retry Mechanism for sqlcmd in Bash

Jose Manuel Jurado Diaz won’t let failure get him down:


Efficiently managing temporary failures and timeouts is crucial in production environments when connecting to databases. In this article, we’ll explore how to implement a retry mechanism with sqlcmd in a Bash script, dynamically increasing timeouts with each failed attempt.

Problem Statement:

Operations can fail due to network issues, overloaded servers, or other temporary problems when interacting with databases. Implementing a retry mechanism helps address these temporary issues without manual intervention.

Read on for the solution script. You could also adapt this to Powershell fairly easily, I think, though if you do go down that road, I’d recommend taking a look at Polly and PsPolly.

Comments closed

Thoughts on Cost Threshold for Parallelism

Erik Darling has some thoughts:

First, I’m not suggesting that anyone should be using the default value for Cost Threshold For Parallelism. It’s old and moldy and not a good fit for most workloads functioning on modern hardware.

My apologies to Azure SQLDB users who can’t change this setting and leave it up to Microsoft to maybe manage it for them based on ???

Some people out there really like fiddling with settings in a usually ill-informed reaction to Some Script They Found On The Internet, without reading the fine print.

Erik’s thoughts are reasonable overall. My recommendation is to use Michael J. Swart’s technique for tuning cost threshold for parallelism as a starting point, as it gives you a basis for what the net effect of your changes are.

Comments closed