Press "Enter" to skip to content

Category: R

Simulating the Monty Hall Problem in R

Jason Bryer takes us through a classic introductory problem to Bayesian statistics:

I find that when teaching statistics (and probability) it is often helpful to simulate data first in order to get an understanding of the problem. The Monty Hall problem recently came up in a class so I implemented a function to play the game.

The Monty Hall problem results from a game show, Let’s Make a Deal, hosted by Monty Hall. In this game, the player picks one of three doors. Behind one is a car, the other two are goats. After picking a door the player is shown the contents of one of the other two doors, which because the host knows the contents, is a goat. The question to the player: Do you switch your choice?

This is one of the biggest “aha!” moments in statistics, in the sense that it is not intuitively obvious and is easy to get wrong, but once you understand why it is true, it makes reasoning over time and knowledge changes easier. H/T R-Bloggers.

Leave a Comment

Testing in R with testthat

Aida Gjoka writes a test:

Testing is an important step when developing code in R or any other language. If you are a Python user, you can consider reading our previous blogs in pytest. Writing tests helps us make sure that the code is working as expected. In the R ecosystem, the testthat package is one of the most used frameworks. In this blog we will explore some of the main properties of {testthat} highlighting some of the most useful functions with some examples.

Read on to see how it works. This isn’t a mocking library, but rather an assertions-based testing library. And near the end, Aida includes an extra library that helps with plot testing.

Leave a Comment

Handling Missing Data in R

M. Fatih Tüzen fills in the gaps:

Data preprocessing is a cornerstone of any data analysis or machine learning pipeline. Raw data rarely comes in a form ready for direct analysis — it often requires cleaning, transformation, normalization, and careful handling of anomalies. Among these preprocessing tasks, dealing with missing data stands out as one of the most critical and unavoidable challenges.

Missing values appear in virtually every domain: surveys may have skipped questions, administrative registers might contain incomplete records, and clinical trials can suffer from dropout patients. Ignoring these gaps or handling them naively does not just reduce the amount of usable information; it can also introduce bias, decrease statistical power, and ultimately compromise the validity of conclusions. In other words, missing data is not just an inconvenience — it is a methodological problem that demands rigorous attention.

Quite often, we gloss over what to do with missing data when explaining or working through the data science process, in part because it’s a hard problem. This post digs into the specifics of the matter, taking us through eight separate methods. H/T R-Bloggers.

Comments closed

Animated Maps in R with gganimate

Osheen MacOscar looks at a new version of an old package:

In this blog post, we are going to use data from the {gapminder} R package, along with global spatial boundaries from ‘opendatasoft’. We are going to plot the life expectancy of each country in the Americas and animate it to see the changes from 1957 to 2007.

The {gapminder} package we are using is from the Gapminder foundation, an independent educational non-profit fighting global misconceptions. The cover issues like global warming, plastic in the oceans and life satisfaction.

There are several common gotchas that Osheen takes us through before building an animated map of the western hemisphere.

Comments closed

Using Python in R in Excel

Adam Gladstone wraps up a series on an R add-in for Excel:

In the last post in this series I am going to look at calling Python from R. Even though Excel now provides a means of calling Python scripts directly, using the =PY() formula in a worksheet, there are still occasions when it is beneficial to call Python via R. For example, it turns out that importing yfinance produces a ‘module not found’ error using Excel’s function. According to the documentation, yfinance is not one of the open source libraries that the Excel Python secure distribution supports. To get around this issue, we can use the R package Reticulate. This lets us load and run Python scripts from R. As we have seen in the previous parts of this series, the ExcelRAddIn allows us to run R scripts from an Excel worksheet. And putting these two together is quite simple.

I’m glad Adam mentioned this because my first question was going to be, why use this when Excel has Python capabilities built in? And that’s a reasonable answer.

Comments closed

Using R for Forecasting in Excel

Adam Gladstone continues a series on using R in Excel:

We have already seen how to obtain descriptive statistics in Part I and how to use lm() in Part II. In this part (Part III) of the series we will look at using R in Excel to perform forecasting and time series analysis.

In the previous two parts we have seen different ways to handle the output from R function calls, unpacking and massaging the data as required. In this part we are going to focus on setting up and interacting with a number of models in the ‘forecast’ package (fpp2).

Read on for the demo. This is getting into territory that is by no means trivial to do natively in Excel.

Comments closed

Linear Regression with R in Excel

Adam Gladstone continues a series on working with R in Excel via the ExcelRAddIn component:

In the first part of this series, I looked at using R in Excel to obtain descriptive statistics. In this second part of the series I am going to look at using R in Excel to perform linear regression, specifically using the lm() functionlm() is a real workhorse function. It can be used to carry out both single and multiple regression and different types of analysis of variance. For this demonstration I will only focus on single and multiple regression.

The workbook for this part of the series is: “Part II – R in Excel – Linear Regression.xlsx”. As before, the ‘References’ worksheet lists links to external references. The ‘Libraries’ worksheet loads additional (non-default) packages. In this demonstration, I use the datarium and broom packages. The ‘Datasets’ worksheet contains the data referenced in the worksheets.

Click through to see how you can perform ordinary least squares regression, multiple linear regression, and even logistic regression in Excel with a bit of R code. H/T R-Bloggers.

Comments closed

Using R for Descriptive Statistics in Excel

Adam Gladstone shows off an Excel add-in:

The purpose of this series of posts is to demonstrate some use-cases for R in Excel using the ExcelRAddIn component (disclaimer: I am the developer of this add-in: ExcelRAddIn). The fundamental rationale for the add-in is that it allows access to the extensive R ecosystem within an Excel worksheet. Excel provides many excellent facilities for data wrangling and analysis. However, for certain types of statistical data analysis, the limitations of the built-in functions even alongside the Analysis ToolPak is not sufficient, and R provides superior facilities (for example, for performing LDA, PCA, forecasting and time series analysis to mention a few).

Click through for examples of how it all works. H/T R-Bloggers.

Comments closed