R – Page 49 – Curated SQL

Visualizing Networks of R Library Usage

Published 2022-03-18 by Kevin Feasel

Bryan Shalloway has fun with network plots:

In previous posts and threads I’ve alluded to the potential utility of visualizing the relationships between parsed functions/packages and files as a network plot.
I added the function network_plot() to funspotr. In this post I’ll simply output the network plots of the parsed-out packages from the code collections discussed in the prior two posts:

Click through for interactive plots of what different people in the R community use.

Comments closed

Quoted and Unquoted Parameters in the Tidyverse

Published 2022-03-16 by Kevin Feasel

Sebastian Sauer shows two ways to dereference a parameter:

Using the tidyverse ecosystem, programming – instead of interactive use – may be something different or unusual and it may take some time to wrap your head around it.
In this post, I’ll show how to deal with a standard situation (using tidyvserse’ nonstandard evaluation). More precisely, there are two (complementary) situations we’ll address:

Read on for those techniques.

Comments closed

Checking R Function Inputs

Published 2022-03-15 by Kevin Feasel

Hugo Gruson, et al, share some techniques for ensuring function input matches your expectations:

Are you, like we were, tired of filling your functions with argument checking code that sometimes ends up being longer that the core of the function itself? Are you trying to find what is the most efficient approach to check inputs easily and without forgetting any edge cases? Read about our exploration into the various ways to check your function inputs in R in this blog post. And please share your own tips and discoveries in the comment section!

Read on for several techniques.

Comments closed

Measurement Units in R

Published 2022-03-15 by Kevin Feasel

K.H. Kim has an article on unit conversion in R:

Data mtcars has a column named mpg. mpg means miles per gallon. ‘Mile’ and ‘gallon’ are units for length and volume. A mile is approximately 1.6 kilometers and a gallon is approximately 3.7 liters. Mile and gallon sound unfamiliar to people who live outside England or U.S.A. because international standard units for length and volume are meter and liter.
In this post, we will learn how to convert a unit to another unit, for instance, we will convert mpg to km/L, which is more comprehensible to people who use SI units

This is something that I’m surprised languages don’t do more of. F# has the concept of units of measure but these are compile-time—at runtime, you can still break the rules. And in most languages, type systems are so simplistic that it’s all just plain decimal math. H/T R-bloggers.

Comments closed

Apply Functions in R

Published 2022-03-09 by Kevin Feasel

Selina Cheng explains how the various apply() functions work:

Today I’m going to talk about a useful family of functions that allows you to repetitively perform a specified function (e.g., sum(), mean()) across a vector, list, matrix, or data frame. For those of you familiar with ‘for’ loops, the apply() family often allows you to avoid constructing those and instead wrap the loop into one simple function.
I’m going to discuss the functions apply(), lapply(), sapply(), and tapply() in this blog post (as well as using the dplyr library for similar tasks). These functions all end in apply() because you apply the function you want across all the specified elements.

Read on to see how these functions work. H/T R-Bloggers.

Comments closed

Trials and Tribulations of Maintaining CRAN Packages

Published 2022-03-09 by Kevin Feasel

John Mount explains the downside cost of CRAN being so useful:

If this automated email from a bulk sender bounces, goes to SPAM, or isn’t responded to quickly: your package will be archived or removed from CRAN. We’ve received these emails, and always acted on them quickly, out of fear.
The referred to check results are often not reproducible. For example, our most recent scare (that hasn’t yet triggered the email, and we have submitted a work-around before complaining here) was just “SUMMARY: processing the following file failed”, without details beyond the name of the failing file.

This is a tricky problem. On the one hand, as an end user of packages, I want packages playing nicely with each other. This is a lot better than Pip’s “Oh, sorry, you need version X but to install version X, it’ll break package Y as it needs < X” nightmare.

On the other hand, as a maintainer of a package, there’s a lot of added effort on a tight timeline for what is usually a volunteer effort.

I don’t have any CRAN packages I maintain and so I tend to be on the beneficiary side of things. But it’s important to keep those package maintainers in mind and one of the easiest ways to do that is to make explicit, reproducable bug reports. It may not make the deadlines more lax but at least that makes maintainers’ lives easier.

Comments closed

The (Non)-Slowness of Loops in R

Published 2022-02-21 by Kevin Feasel

Michael Mayer notes that loops in R aren’t actually all that bad:

Since then, the R core team and the community has invested tons of time to improve R and also to make it faster. There are things like RCPP and parallel computing to speed up loops.
But what still relatively few R users know: loops are not that slow anymore. We want to demonstrate this using two examples.

Click through for the examples.

Comments closed

Visualizing High-Density Regions with R

Published 2022-02-16 by Kevin Feasel

The rOpenSci team covers the history of the gghdr package:

This was how being a newcomer to rOpenSci OzUnconf 2019 felt. It was incredible to be a part of such a diverse, welcoming and inclusive environment. I thought it would be fun to blog about how it all began, and the twists and turns we experienced along the way as we developed the gghdr package. The package provides tools for plotting highest density regions with ggplot2 and was inspired by the package hdrcde developed by Rob J Hyndman. The highest density region approach of summarizing a distribution is useful for analyzing multimodal distributions and can be composed of numerous disjoint subsets. For example, the histogram of the highway mileage (hwy) data from the mpg dataset (a) shows that cars with 6 cylinders (cyl) are bimodally distributed, which is reflected in the highest density region (HDR) boxplot (c) but not in the standard boxplot (b). Hence, we see that HDRs are useful in displaying multimodality in the distribution.

Read on for a short history of an interesting package.

Comments closed

Filtering Tips with dplyr

Published 2022-02-14 by Kevin Feasel

Sebastian Sauer shows off some dplyr functionality:

Programming with the tidyverse
… Is not exactly self-evident. It actually requires some head wrapping, at least in my experience. In this post, we are exploring some aspects on programming when filtering rows. Let’s see.

That said, click through for a few options on how to work with dplyr’s filter() function.

Comments closed

Identifying R Functions and Packages in GitHub Gists

Published 2022-02-10 by Kevin Feasel

Bryan Shalloway looks at gists:

A problem I bumped into was that most of Chelsea’s gists don’t actually have .R or .Rmd extensions so my approach skipped most of her snippets. I wanted to parse my own gists but ran into a related problem that most of my github gist code snippets are saved as .md files¹.
In this post I…
1. create a function to extract code chunks from simple .md files
2. parse the functions and packages in my code using funspotr.

Click through to see the code in action.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: R