Category: R

Comparing Datasets in R

Published 2021-05-27 by Kevin Feasel

The folks at finnstats take us through a package to compare datasets in R:

How to find dataset differences in R, when the pieces of information are changing between datasets it’s a difficult task to identify the same.
Here we are going to discuss the daff package in R, daff package helps us to identify the differences and visualize them in a beautiful way.

Click through for the demonstration, including a video. H/T R-Bloggers

Comments closed

New Features in R 4.1.0

Published 2021-05-19 by Kevin Feasel

The Jumping Rivers folks have some good news for us:

The stability of the base packages is a great strength of the R ecosystem: both underpinning, and contrasting with, the rapid pace at which contributed packages (CRAN, BioConductor) evolve.
Imagine introducing a new feature into the R language. Even if problems arise with the usability of that feature, it would need to be maintained until at least the next major release, by which time thousands of developers and analysts may depend upon it. Unsurprisingly, the R maintainers are exceedingly cautious when introducing new syntax.
Similarly, you should employ caution when using new syntax in your own code. If you do use syntax that was introduced in R-4.1, be aware that your code will not run on versions of R that precede this. For example, this may prevent your new analysis scripts from running on your colleague’s computer, or prevent users from installing your new package.

Given how many third-party packages have regular breaking changes, I do wish more people would follow this advice.

Getting into the meat of things, I really like the F#-style pipe in R: |> makes a lot of intuitive sense, though I do wish they had included a placeholder element with the native pipe.

Comments closed

Plotting Correlation Analyses in R

Published 2021-05-17 by Kevin Feasel

Finnstats shows a few techniques for plotting correlation in R:

Correlation analysis, correlation is a term that is a measure of the strength of a relationship between two variables.
Pearson’s Product-Moment Correlation
One of the most common measures of correlation is Pearson’s product-moment correlation, which is commonly referred to simply as the correlation, or just the letter r.
Correlation shows the strength of a relationship between two variables and is expressed numerically by the correlation coefficient.

Click through for examples from several packages. H/T R-Bloggers.

Comments closed

Hot, Cool, and Large Numbers

Published 2021-05-11 by Kevin Feasel

Holger von Jouanne-Diedrich hits the casino:

The longest streak in roulette purportedly happened in 1943 in the US when the colour red won 32 consecutive times in a row! A quick calculation shows that the probability of this happening seems to be beyond crazy:
0.5^32[1] 2.328306e-10
So, what is going on here? For once streaks and clustering happen quite naturally in random sequences: if you got something like “red, black, red, black, red, black” and so on I would worry if there was any randomness involved at all (read more about this here: Learning Statistics: Randomness is a strange beast). The point is that any sequence that is defined beforehand is as probable as any other (see also my post last week: The Solution to my Viral Coin Tossing Poll). Yet streaks catch our eye, they stick out.

There’s one critical assumption in this post, which is that the game is fair, in that each event has an equal probability of happening. But as a Bayesian, if a roulette table hits red 32 times in a row, it certainly opens the door to the idea that maybe the odds on that table with that dealer aren’t quite equal between red and black.

Comments closed

Containerizing a Shiny App

Published 2021-05-10 by Kevin Feasel

Peter Solymos takes us through the process of running a Shiny app in a Docker container:

Docker provides isolation to applications. Images are immutable. Running multiple instances of the same image can serve many users at the same time. All these general advantages of containerized applications apply to Shiny apps too.
All the general advantages of containerized applications apply to Shiny apps. Docker provides isolation to applications. Images are immutable: once build it cannot be changes, and if the app is working, it will work the same in the future. Another important consideration is scaling. Shiny apps are single threaded, but running multiple instances of the same image can serve many users at the same time. Let’s dive into the details of how to achieve this.

Click through for a walkthrough. Containerizing these sorts of apps has been a boon for my team, as it lets us spin up appropriately-sized servers on the cheap. H/T R-Bloggers

Comments closed

Functional Data Analysis in R

Published 2021-05-06 by Kevin Feasel

Joseph Ricker gives us a gentle introduction to a not-so-gentle topic:

This plot might depict 80 measurements for a participant in a clinical trial where each data point represents the change in the level of some protein level. Or it could represent any series of longitudinal data where the measurements are take at irregular intervals. The curve looks like a time series with obvious correlations among the points, but there are not enough measurements to model the data with the usual time series methods. In a scenario like this, you might find Functional Data Analysis (FDA) to be a viable alternative to the usual multi-level, mixed model approach.
This post is meant to be a “gentle” introduction to doing FDA with R for someone who is totally new to the subject. I’ll show some “first steps” code, but most of the post will be about providing background and motivation for looking into FDA. I will also point out some of the available resources that a newcommer to FDA should find helpful.

Read on to learn more.

Comments closed

Table Design in R with mmtable2

Published 2021-05-05 by Kevin Feasel

Matt Dancho walks through a package to make tables look great in R:

I love ggplot2 for plotting. The grammar of graphics allows us to add elements to plots. Tables seem to be forgotten in terms of an intuitive grammar with tidy data philosophy – Until now. mmtable2 aims to be the ggplot2 for tables, leveraging the awesome GT table package.
The mmtable2 package aims to make it easy to create tables by:
1. Using a ggplot2-style syntax for using a grammar of table operations.
2. Extends the amazing GT table package.

Read on for the process and a demonstration.

Comments closed

Tips for Improving Code Performance in R

Published 2021-05-05 by Kevin Feasel

Mira Celine Klein continues a series on code performance in R:

This is the second part of our series about code performance in R. It contains a lot of approaches to reduce the time your code needs to run. It’s useful to know those ideas before starting to write new code, but it also helps to optimize existing code.
If you have already written some code you want to speed up, but don’t know which part of it is actually slow, I recommend you to read the first part of this series on profiling. That article also introduces the microbenchmark package which we are going to use to measure code performance in this article.
Let’s start with a seemingly obvious rule, which is however not always easy to follow.

Read on for some tips. H/T R-bloggers.

Comments closed

Writing SQL to Query R data.frames

Published 2021-05-03 by Kevin Feasel

Tomaz Kastrun tries out a package:

There are many R packages for querying SQL Databases. Recently, I was looking into sqldf p ackage | CRAN documentation.
There are so many great advantages (simple running SQL statements, creating, loading, deleteing data to data.frames, connectivity to many databases, support for SQL functions, data types and many many more) , but one that was really a major win was interactions with data frames and SQL Language.

Between sqldf and dbplyr, you get it both ways: treat a data.frame like a SQL table, or treat a SQL database like R data.frames.

Comments closed

Plotting XGBoost Trees with R

Published 2021-04-30 by Kevin Feasel

Andrew Treadway shows off a method to visualize the results of training an XGBoost model:

In this post, we’re going to cover how to plot XGBoost trees in R. XGBoost is a very popular machine learning algorithm, which is frequently used in Kaggle competitions and has many practical use cases.
Let’s start by loading the packages we’ll need. Note that plotting XGBoost trees requires the DiagrammeR package to be installed, so even if you have xgboost installed already, you’ll need to make sure you have DiagrammeR also.

Click through for the process. H/T R-Bloggers.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31