Press "Enter" to skip to content

Category: R

Using the Pipe Operator in ggplot2

Tomaz Kastrun reduces the number of pipe-like operators:

Using pipe %>% or chaining commands is very intuitive for creating following set of commands for data preparation. Visualization library ggplot in this manner uses sign “+” (plus) to do all the chaining. What if we would have to replace with the pipe sign?

This is because ggplot was developed prior to magrittr took over the piping world in R, so there wasn’t a “normal” pipe. I had been hopeful that ggvis would take over, as it does use the %>% pipe, but that project has gone dormant.

Comments closed

Image Sizing in RMarkdown Documents

The Jumping Rivers team shares some insight on image creation:

In this series of posts we’ll consider the (simple?) task of generating and including figures for the web using R & {knitr}. Originally this was going to be a single post, but as the length increase, we’ve decided to separate it into a separate articles. The four posts we intend to cover are

– setting the image size (this post)
– selecting the image type, PNG vs JPEG vs SVG
– including non-generated files in a document
– setting global {knitr} options.

Read on for the first post in the series.

Comments closed

Plotting Multiple Plots in R using map and ggplot

Sebastian Sauer gives us a quick solution to plotting one graph per variable:

Say we have a data frame where we would like to plot each numeric variables’s distribution.

There are a number of good solutions outthere such as this one, or here, or here.

When I read this, my first thought was along the lines of, “Why not use facets or something like cowplot?” But then it clicked that this is per-variable plotting, whereas faceting requires you choose a variable and see the plots based on that variable’s distinct values..

Comments closed

Counting Open Lockers in R

Holger von Jouanne-Diedrich solves a riddle:

We are standing in front of 100 lockers arranged side by side, all of which are closed. One man has a bunch of keys with all 100 keys and will pass the lockers exactly a hundred times, opening or closing some of them.

On the first pass, he opens all the lockers. On the second pass, the man will go to every other locker and change its state. That means: If it is closed, it will be opened. If it is already open, it will be closed. In this case, he closes lockers 2, 4, 6… 98 and 100, because all doors were open before.

On the third pass, he changes the state of every third locker – that is, 3, 6, 9, … 96, 99. Closed doors are opened, open doors closed. In the fourth pass, every fourth locker is changed, at the fifth every fifth – and so on. At the last, the 100th, the man finally only changes the state of door number 100.

The question is: How many of the 100 compartments are open after the 100th pass?

Click through for one solution in R.

Comments closed

Sunflower Plots in R

Kenneth Tay takes a look at a sunflower plot:

sunflower plot is a type of scatterplot which tries to reduce overplotting. When there are multiple points that have the same (x, y) values, sunflower plots plot just one point there, but has little edges (or “petals”) coming out from the point to indicate how many points are really there.

My first thought on it is that it’s too busy and doesn’t do its job of portraying a mass of data points very well. When you have just a few observations, then yeah, it’s not too bad. But once you have any reasonable amount of density on the plot, it’s better to use jitter and transparency (as Kenneth points out). H/T R-bloggers

Comments closed

Reporting on Correlation Analysis in R

Petr Baranovskiy continues a series on correlation analysis using R:

This is the second part of the Correlation Analysis in R series. In this post, I will provide an overview of some of the packages and functions used to perform correlation analysis in R, and will then address reporting and visualizing correlations as text, tables, and correlation matrices in online and print publications.

Read the whole thing.

Comments closed

Determining a Good Test Set Size

John Mount thinks about test set size:

In this note we will answer “what is a good test set size?” three ways.

– The usual practical answer.
– A decision theory answer.
– A novel variational answer.

Each of these answers is a bit different, as they are solved in slightly different assumed contexts and optimizing different objectives. Knowing all 3 solutions gives us some perspective on the problem.

My rule of thumb is that I want it to be as small as possible while containing the highest likelihood of hitting all real-world scenarios enough times to provide a valid comparison. This conversely maximizes the size of the training data set, giving us the best chance of seeing the widest variety of scenarios we can during the formative phase.

And as usual, John goes way deeper than my rules of thumb. I like this post a lot.

Comments closed

Model Post-Processing with insight

The easystats team talks about the insight package in R:

We are talking about the insight package. It is what allows other packages, like easystats (parameterseffectsizeperformancereport, …) or ggstatsplotsjstats or modelsummary to be as powerful as they are, supporting tons of different R models. So why make you life hard when you can be like them, and rely on insight?

It is made for developers (and users) that do some postprocessing of different models (e.g., extracting stuff like parameters, values, data, names, specifications, predictions, priors, etc.), whether it is to nicely display their results or to do further computation.

Click through for an example of what it does and how it works. H/T R-bloggers

Comments closed

Using OAuth 2 in R Packages

Maelle Salmon explains how OAuth 2 works and also how you can use it in R packages:

When writing an R package wrapping an API using OAuth 2.0 you’ll need the user to grant access to an “app”, which will allow to create an access token and a refresh token. The access token will then often be passed to the API in a header when making requests, whilst the refresh token would be posted in a query string when the access token needs to be renewed.

Your problem is: how do I imitate a third-party app? Thankfully for you, in most cases the complexity can be handled by the httr package. For other cases, or if you want to e.g. only use curl, you will have to get creative. 

Read on for more detail.

Comments closed

AzureCosmosR

Hong Ooi takes us through an R library for working with Cosmos DB:

Among other features, Azure Cosmos DB is notable in that it supports multiple data models and APIs. When you create a new Cosmos DB account, you specify which API you want to use: SQL/core API, which lets you use a dialect of T-SQL to query and manage tables and documents; MongoDB; Azure table storage; Cassandra; or Gremlin (graph). AzureCosmosR provides a comprehensive interface to the SQL API, as well as bridges to the MongoDB and table storage APIs. On the Resource Manager side, AzureCosmosR extends the AzureRMR class framework to allow creating and managing Cosmos DB accounts.

AzureCosmosR is now available on CRAN. You can also install the development version from GitHub, with devtools::install_github("Azure/AzureCosmosR").

Hong provides examples for us using three of the Cosmos DB APIs, so check it out.

Comments closed