Press "Enter" to skip to content

Category: R

Measurement Units in R

K.H. Kim has an article on unit conversion in R:

Data mtcars has a column named mpgmpg means miles per gallon. ‘Mile’ and ‘gallon’ are units for length and volume. A mile is approximately 1.6 kilometers and a gallon is approximately 3.7 liters. Mile and gallon sound unfamiliar to people who live outside England or U.S.A. because international standard units for length and volume are meter and liter.

In this post, we will learn how to convert a unit to another unit, for instance, we will convert mpg to km/L, which is more comprehensible to people who use SI units

This is something that I’m surprised languages don’t do more of. F# has the concept of units of measure but these are compile-time—at runtime, you can still break the rules. And in most languages, type systems are so simplistic that it’s all just plain decimal math. H/T R-bloggers.

Comments closed

Apply Functions in R

Selina Cheng explains how the various apply() functions work:

Today I’m going to talk about a useful family of functions that allows you to repetitively perform a specified function (e.g., sum()mean()) across a vector, list, matrix, or data frame. For those of you familiar with ‘for’ loops, the apply() family often allows you to avoid constructing those and instead wrap the loop into one simple function.

I’m going to discuss the functions apply()lapply()sapply(), and tapply() in this blog post (as well as using the dplyr library for similar tasks). These functions all end in apply() because you apply the function you want across all the specified elements.

Read on to see how these functions work. H/T R-Bloggers.

Comments closed

Trials and Tribulations of Maintaining CRAN Packages

John Mount explains the downside cost of CRAN being so useful:

If this automated email from a bulk sender bounces, goes to SPAM, or isn’t responded to quickly: your package will be archived or removed from CRAN. We’ve received these emails, and always acted on them quickly, out of fear.

The referred to check results are often not reproducible. For example, our most recent scare (that hasn’t yet triggered the email, and we have submitted a work-around before complaining here) was just “SUMMARY: processing the following file failed”, without details beyond the name of the failing file.

This is a tricky problem. On the one hand, as an end user of packages, I want packages playing nicely with each other. This is a lot better than Pip’s “Oh, sorry, you need version X but to install version X, it’ll break package Y as it needs < X” nightmare.

On the other hand, as a maintainer of a package, there’s a lot of added effort on a tight timeline for what is usually a volunteer effort.

I don’t have any CRAN packages I maintain and so I tend to be on the beneficiary side of things. But it’s important to keep those package maintainers in mind and one of the easiest ways to do that is to make explicit, reproducable bug reports. It may not make the deadlines more lax but at least that makes maintainers’ lives easier.

Comments closed

Visualizing High-Density Regions with R

The rOpenSci team covers the history of the gghdr package:

This was how being a newcomer to rOpenSci OzUnconf 2019 felt. It was incredible to be a part of such a diverse, welcoming and inclusive environment. I thought it would be fun to blog about how it all began, and the twists and turns we experienced along the way as we developed the gghdr package. The package provides tools for plotting highest density regions with ggplot2 and was inspired by the package hdrcde developed by Rob J Hyndman. The highest density region approach of summarizing a distribution is useful for analyzing multimodal distributions and can be composed of numerous disjoint subsets. For example, the histogram of the highway mileage (hwy) data from the mpg dataset (a) shows that cars with 6 cylinders (cyl) are bimodally distributed, which is reflected in the highest density region (HDR) boxplot (c) but not in the standard boxplot (b). Hence, we see that HDRs are useful in displaying multimodality in the distribution.

Read on for a short history of an interesting package.

Comments closed

Identifying R Functions and Packages in GitHub Gists

Bryan Shalloway looks at gists:

A problem I bumped into was that most of Chelsea’s gists don’t actually have .R or .Rmd extensions so my approach skipped most of her snippets. I wanted to parse my own gists but ran into a related problem that most of my github gist code snippets are saved as .md files1.

In this post I…

1. create a function to extract code chunks from simple .md files

2. parse the functions and packages in my code using funspotr.

Click through to see the code in action.

Comments closed

Making an R Box Plot from a Picture

Tomaz Kastrun builds a plot:

We create a raster image from a picture and calculating the ratio of the pixels on the scale of grayscale. The more the darker colour is represented in the pixels, the bigger the value. And this value is converted into the vector of values. And each vector is represneted as a violin boxplot.

Click through for an example.

Comments closed

Building Packages from Base R Files

John Nash and Arkajyoti Bhattacharjee package things up:

This article tries to explain an approach to developing alternative versions of functions which are in the distributed base of R. Our interest was in developing improvements to the nls() function and related features in R as part of a Google Summer of Code project for which Arkajyoti Bhattacharjee was the funded student. However, nls() has many tentacles involving a number of files and functions that may or may not be called as nls() is executed.

Part of the difficulty in carrying out such development of alternative versions is that one needs to be able to execute the new variants in parallel with the existing ones. A heavy-effort approach would be to have separate full sets of R code and build each system and run them separately. That is, we want to have two or more versions of R in the same computing system.

Read on for the process, some difficulties you might encounter along the way, and specific issues you might run into on Windows. H/T R-Bloggers.

Comments closed

Great Features in R 4.0.0

Colin Gillespie looks at a few new features in R version 4:

version 4.0.0 was released almost two years ago. The change in the major version, 3.x.y to 4.0.0, represented significant and potentially breaking changes. For an organisation to start using these new features, everyone in the company must have access to that version; otherwise code isn’t shareable. This naturally slows down adoption.

We moved our internal R projects to depend on version R 4.0.0 around twelve months ago – a few months after the release date. Over the last year we’ve also assisted a number of clients in making the move; typically with Shiny applications. This post aims to highlight some of the features we’ve found useful and also some of the potential pitfalls.

It’s crazy to think that it’s been so long

Comments closed