Press "Enter" to skip to content

Category: R

Snapshot Testing in R

Jakub Sobolewski drills into a particular form of testing:

Snapshot testing is not about screenshots.

Most people meet it through UI regression tests: render a component, save a picture, fail the build when the picture changes. So the technique gets filed away as “the thing that compares images.” That is one use. But not the only one.

The mechanic underneath is general. Capture some output, save it to a file, and on every later run compare fresh output against the saved copy. The output can be a plot. It can also be console text, a log, a data frame, an error message, or a deeply nested list. Anything you can serialize, you can snapshot.

Read on to see how you can perform snapshot testing, using examples in R to demonstrate. H/T R-Bloggers.

Leave a Comment

Noise in CRAN Package Additions

Joseph Rickert shows a consequence of lowering the bar for application development:

If you are reading this post on R-bloggers, you will probably know that I have been publishing my selection of the “Top 40” new R packages on CRAN for quite some time. I did this first as part of my work at Revolution Analytics, then on R Views for RStudio and Posit, and now here on R Works. It used to take about a day’s worth of pleasurable work spread out over a month to select forty interesting packages. For a hundred or so packages, I could look at all of the package webpages, download and play with a small number of them. Now, the “Top 40” has become a real hamster-on-the-wheel project. The following plot shows my count of the number of new packages to make it to CRAN since I began publishing on R Works.

Click through to see what Joseph has laid out. The part that surprises me is, historically, CRAN was pretty difficult to get a package into and you typically needed to jump through a certain number of quality gates. I suppose that has to have changed given what Joseph notes around the lack of documentation in many of these new packages.. But it could be that my understanding of it was wrong H/T R-Bloggers.

Leave a Comment

The Ulam Prime Spiral

Tomaz Kastrun re-creates a classic:

Stanislaw Ulam, Los Alamos, 1963 was bored in a meeting and he started dooddling integers in a spiral and circled the primes. Diagonal lines appeared. He later showed it to Martin Gardner, to Ulam surprise, Gardner published his findings in Scientific American. We are still confused to this day.

Click through for a demonstration of this in action.

Leave a Comment

Adding Patterns to ggplot2 Plots

Zhenguo Zhang adds some patterns:

Adding patterns to plots is a great way to improve accessibility (making plots colorblind-friendly) and to add an extra dimension of information. The ggpattern package provides a rich set of tools to achieve this in ggplot2.

I’m personally not the biggest fan of patterns. I see them as a point of necessity when dealing with grayscale circumstances, such as printing out a chart in an academic journal. But it’s very easy to overdo patterns and end up making a mess of the visual.

But one side note about color vision deficiency and plots: make sure that your plots are monochrome-friendly because somebody probably will try to print out your chart or view it on a grayscale-only device. Or might actually be monochromatic.

Comments closed

Probabilistic Time Series Cross-Validation in R

Thierry Moudiki checks an interval:

A previous post introduced the crossvalidation package for R. This time, the focus is on probabilistic forecasting — evaluating not just how accurate point forecasts are, but how well-calibrated prediction intervals are, using empirical coverage rates and Winkler scores – and crossvalidation.

Click through for the code and not much additional commentary. H/T R-Bloggers.

Comments closed

Migrating testthat to testit

Yihui Xie explains how to switch test frameworks in R:

Back in 2013, I wrote about testing R packages when I first released testit. Thirteen years later, I still believe that unit testing should be nothing more than “tell me if something unexpected happened.” Recently I converted a large testthat test suite to testit, and I thought I’d share a practical guide for anyone considering the same move.

Click through for that guide.

Comments closed

Setting Function Parameters for Debugging in R

Jason Bryer has a function:

I tend to write a lot of functions that create specific graphics implemented with ggplot2. Although I try to pick graphic parameters (e.g. colors, text size, etc.) that are reasonable, I will typically define all relevant aesthetics as parameters to my function. As a result, my functions tend to have a lot of parameters. When I need to debug the function I need to have all those parameters set in the global environment which usually requires me highlighting each assignment and running it. This function automates this process.

Click through to see how it works. H/T R-Bloggers.

Comments closed

Comparing {targets} in R to dbt for Data Engineering

Jonathan Carroll compares two approaches:

Thinking of a real-world project I could take for a spin, I decided to build some ingestion for my personal finances. I’ve used Quickbooks previously which connects up to my bank and helps categorise personal and business (as a freelance contractor) expenses. I decided I’ll build my own ‘slowbooks’ processing workflow based on some manual exports (I don’t think my bank has an API).

Both of the approaches I’ll compare here build on the idea of a Makefile which connects up commands to run based on dependencies, and only runs what is needed; if all the input dependencies of a step have not changed, there’s no need to re-run that step. From what I understand, you could largely get away with just writing some Makefiles (or the newer implementation just (just.systems)) but these two approaches help to better structure how that’s constructed.

Read on for Jonathan’s discovery process and ultimate findings. H/T R-Bloggers.

Comments closed

A Verbose Pipe Operator for dplyr Pipelines

Guillaume Pressiat shows off logrittr:

In SAS, every DATA step prints a log:

NOTE: There were 120000 observations read from WORK.SALES.
NOTE: 7153 observations were deleted.
NOTE: The data set WORK.RESULT has 112847 observations and 11 variables.

R’s dplyr pipelines are silent. logrittr fills that gap with %>=%, a drop-in pipe that logs row counts, column counts, added/dropped columns, and timing at every step, with no function masking.

Click through to see how logrittr helps. Back when I was using R heavily, I would have really enjoyed this package. H/T R-Bloggers.

Comments closed