Finding an Unfair Coin with R

Sebastian Sauer works out a coin flip problem:

A stochastic problem, with application to financial theory. Some say it goes back to Warren Buffett. I relied to my colleague Norman Markgraf, who pointed it out to me.

Assume there are two coins. One is fair, one is loaded. The loaded coin has a bias of 60-40. Now, the question is: How many coin flips do you need to be “sure enough” (say, 95%) that you found the loaded coin?

Let’s simulate la chose.

It took a few more flips than I had expected but the number is not outlandish.

Standard and Non-Standard Evaluation in R

John Mount explains Standard Evaluation versus Non-Standard Evaluation in R:

In standard (or value oriented evaluation) code you type in is taken to be variable names, functions, names, operators, and even numeric literal values. String values or literals need extra marks, such as quotes.

John walks us through several examples along the way. At the end, John is a major proponent of Standard Evaluation over Non-Standard Evaluation.

Compacting R Libraries

Kevin Feasel



Dirk Eddelbuettel shows how you can save a lot of space by stripping excess information from R packages:

Back in August of 2017, we wrote two posts #9: Compating your Share Libraries and #10: Compacting your Shared Libraries, After The Buildabout “stripping” shared libraries. This involves removing auxiliary information (such as debug symbols and more) from the shared libraries which can greatly reduce the installed size (on suitable platforms – it mostly matters where I work, i.e. on Linux).

There’s a pretty good space savings in the tidyverse package. H/T R-Bloggers.

Residual Analysis with R

Abhijit Telang shares a few techniques for doing post-regression residual analysis using R:

Naturally, I would expect my model to be unbiased, at least in intention, and hence any leftovers on either side of the regression line that did not make it on the line are expected to be random, i.e. without any particular pattern.

That is, I expect my residual error distributions to follow a bland, normal distribution.

In R, you can do this elegantly with just two lines of code. 
1. Plot a histogram of residuals 
2. Add a Quantile-Quantile plot with a line that passes through, namely, the first and third quantiles.

There are several more techniques in here to analyze residuals, so check it out.

Interactive ggplot Plots with plotly

Laura Ellis takes us through ggplotly:

As someone very interested in storytelling, ggplot2 is easily my data visualization tool of choice. It is like the Swiss army knife for data visualization. One of my favorite features is the ability to pack a graph chock-full of dimensions. This ability is incredibly handy during the data exploration phases. However, sometimes I find myself wanting to look at trends without all the noise. Specifically, I often want to look at very dense scatterplots for outliers. Ggplot2 is great at this, but when we’ve isolated the points we want to understand, we can’t easily examine all possible dimensions right in the static charts.

Enter plotly. The plotly package and ggploty function do an excellent job at taking our high quality ggplot2 graphs and making them interactive.

Read on for several quality, interactive visuals.

Goodbye, gather and spread; Hello pivot_long and pivot_wide

Kevin Feasel



John Mount covers a change in tidyr which mimics Mount and Nina Zumel’s pivot_to_rowrecs and unpivot_to_blocks functions in the cdata package:

If you want to work in the above way we suggest giving our cdatapackage a try. We named the functions pivot_to_rowrecs and unpivot_to_blocks. The idea was: by emphasizing the record structure one might eventually internalize what the transforms are doing. On the way to that we have a lot of documentation and tutorials.

This is your regular reminder that the Tidyverse is very useful, but it is not the entirety of R.

Dependencies as Risks

Kevin Feasel



John Mount makes the point that packages dependencies are innately a risk:

If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish the correctness of all of the dependencies. This is worse than having non-reproducible research, as your work may have in fact been wrong even the first time.

Low dependencies and low complexity dependencies can also be wrong, but in this case there at least exists the possibility of checking things or running down and fixing issues.

There are some insightful comments on this post as well, so check those out. This is definitely an area where there are trade-offs, so trying to reason through when to move in which direction is important.

Custom ggplot2 Fonts

Daniel Oehm shares two techniques for using custom fonts in your ggplot2 visuals:

ggplot – You can spot one from a mile away, which is great! And when you do it’s a silent fist bump. But sometimes you want more than the standard theme.

Fonts can breathe new life into your plots, helping to match the theme of your presentation, poster or report. This is always a second thought for me and need to work out how to do it again, hence the post.

There are two main packages for managing fonts – extrafont, and showtext.

Read on to see how to use each of these packages. H/T R-bloggers

Unit Testing R Code

Kevin Feasel


R, Testing

John Mount points out that you don’t need special infrastructure to perform unit testing in R:

There seems to be a general (false) impression among non R-core developers that to run tests, R package developers need a test management system such as RUnit or testthat. And a further false impression that testthat is the only R test management system. This is in fact not true, as R itself has a capable testing facility in “R CMD check” (a command triggering R checks from outside of any given integrated development environment).

By a combination of skimming the R-manuals ( ) and running a few experiments I came up with a description of how R-testing actually works. And I have adapted the available tools to fit my current preferred workflow. This may not be your preferred workflow, but I have and give my reasons below.

Food for thought for any R developer.

R 3.5.3 Available

Kevin Feasel


R, Versions

David Smith shares some info on R 3.5.3, released on Monday:

The R Core Team announced yesterday the release of R 3.5.3, and updated binaries for Windows and Linux are now available (with Mac sure to follow soon). This update fixes three minor bugs (to the functions writeLinessetClassUnion, and stopifnot), but you might want to upgrade just to avoid the “package built under R 3.5.4” warnings you might get for new CRAN packages in the future.

Click through for more info on this release, including where the name from each R release comes from.


April 2019
« Mar