Press "Enter" to skip to content

Category: R

Trials and Tribulations of Maintaining CRAN Packages

John Mount explains the downside cost of CRAN being so useful:

If this automated email from a bulk sender bounces, goes to SPAM, or isn’t responded to quickly: your package will be archived or removed from CRAN. We’ve received these emails, and always acted on them quickly, out of fear.

The referred to check results are often not reproducible. For example, our most recent scare (that hasn’t yet triggered the email, and we have submitted a work-around before complaining here) was just “SUMMARY: processing the following file failed”, without details beyond the name of the failing file.

This is a tricky problem. On the one hand, as an end user of packages, I want packages playing nicely with each other. This is a lot better than Pip’s “Oh, sorry, you need version X but to install version X, it’ll break package Y as it needs < X” nightmare.

On the other hand, as a maintainer of a package, there’s a lot of added effort on a tight timeline for what is usually a volunteer effort.

I don’t have any CRAN packages I maintain and so I tend to be on the beneficiary side of things. But it’s important to keep those package maintainers in mind and one of the easiest ways to do that is to make explicit, reproducable bug reports. It may not make the deadlines more lax but at least that makes maintainers’ lives easier.

Comments closed

Visualizing High-Density Regions with R

The rOpenSci team covers the history of the gghdr package:

This was how being a newcomer to rOpenSci OzUnconf 2019 felt. It was incredible to be a part of such a diverse, welcoming and inclusive environment. I thought it would be fun to blog about how it all began, and the twists and turns we experienced along the way as we developed the gghdr package. The package provides tools for plotting highest density regions with ggplot2 and was inspired by the package hdrcde developed by Rob J Hyndman. The highest density region approach of summarizing a distribution is useful for analyzing multimodal distributions and can be composed of numerous disjoint subsets. For example, the histogram of the highway mileage (hwy) data from the mpg dataset (a) shows that cars with 6 cylinders (cyl) are bimodally distributed, which is reflected in the highest density region (HDR) boxplot (c) but not in the standard boxplot (b). Hence, we see that HDRs are useful in displaying multimodality in the distribution.

Read on for a short history of an interesting package.

Comments closed

Identifying R Functions and Packages in GitHub Gists

Bryan Shalloway looks at gists:

A problem I bumped into was that most of Chelsea’s gists don’t actually have .R or .Rmd extensions so my approach skipped most of her snippets. I wanted to parse my own gists but ran into a related problem that most of my github gist code snippets are saved as .md files1.

In this post I…

1. create a function to extract code chunks from simple .md files

2. parse the functions and packages in my code using funspotr.

Click through to see the code in action.

Comments closed

Making an R Box Plot from a Picture

Tomaz Kastrun builds a plot:

We create a raster image from a picture and calculating the ratio of the pixels on the scale of grayscale. The more the darker colour is represented in the pixels, the bigger the value. And this value is converted into the vector of values. And each vector is represneted as a violin boxplot.

Click through for an example.

Comments closed

Building Packages from Base R Files

John Nash and Arkajyoti Bhattacharjee package things up:

This article tries to explain an approach to developing alternative versions of functions which are in the distributed base of R. Our interest was in developing improvements to the nls() function and related features in R as part of a Google Summer of Code project for which Arkajyoti Bhattacharjee was the funded student. However, nls() has many tentacles involving a number of files and functions that may or may not be called as nls() is executed.

Part of the difficulty in carrying out such development of alternative versions is that one needs to be able to execute the new variants in parallel with the existing ones. A heavy-effort approach would be to have separate full sets of R code and build each system and run them separately. That is, we want to have two or more versions of R in the same computing system.

Read on for the process, some difficulties you might encounter along the way, and specific issues you might run into on Windows. H/T R-Bloggers.

Comments closed

Great Features in R 4.0.0

Colin Gillespie looks at a few new features in R version 4:

version 4.0.0 was released almost two years ago. The change in the major version, 3.x.y to 4.0.0, represented significant and potentially breaking changes. For an organisation to start using these new features, everyone in the company must have access to that version; otherwise code isn’t shareable. This naturally slows down adoption.

We moved our internal R projects to depend on version R 4.0.0 around twelve months ago – a few months after the release date. Over the last year we’ve also assisted a number of clients in making the move; typically with Shiny applications. This post aims to highlight some of the features we’ve found useful and also some of the potential pitfalls.

It’s crazy to think that it’s been so long

Comments closed

Identifying R Package and Function Use in GitHub Repos

Bryan Shalloway does a search:

TLDR: funspotr provides helpers for spotting the functions and packages in R and Rmarkdown files and associated github repositories. See Examples for catalogues of the functions/packages used in posts by Julia Silge, David Robinson, and others.

This is an interesting project. I’d imagine that with enough different code bases, you could develop a programming profile and possibly understand people’s strengths on a variety of characteristics like which functions they use, what they use given alternatives (e.g., “functional-friendly” map versus the *apply series versus loops), and how familiar they are with certain packages. I could see this being an advanced technique for learning what you should learn next: you obviously have familiarity with packages A, B, and C but it appears you don’t know about E or K and you might learn them to replace some of the work you’re doing with C.

Comments closed

Handling Categorical Data in R

The RSquared Academy blog has a two-parter on handling categorical data in R. Part 1 elaborates on kinds of categorical data and introduces a case study:

While we can rank the categories, we cannot assign a value to them. For example, in satisfaction ranking, we cannot say that like is twice as positive as dislike i.e. we are unable to say how much they differ from each other. While the order or rank of data is meaningful, the difference between two pieces of data cannot be measured/determined or are meaningless. Ordinal data provide information about relative comparisons, but not the magnitude of the differences.

Part 2 shows off ways to work with categorical data in tables:

In this section, we will explore the above ways of summarizing categorical data. We will also spend some time learning about tables as you will be using them extensively while working with categorical data. R has many packages for tabulating data and we list and explore all of them in the R scripts shared in the GitHub repository.

Click through for both guides. H/T R-Bloggers.

Comments closed