Press "Enter" to skip to content

Category: R

Animated Maps in R with gganimate

Osheen MacOscar looks at a new version of an old package:

In this blog post, we are going to use data from the {gapminder} R package, along with global spatial boundaries from ‘opendatasoft’. We are going to plot the life expectancy of each country in the Americas and animate it to see the changes from 1957 to 2007.

The {gapminder} package we are using is from the Gapminder foundation, an independent educational non-profit fighting global misconceptions. The cover issues like global warming, plastic in the oceans and life satisfaction.

There are several common gotchas that Osheen takes us through before building an animated map of the western hemisphere.

Leave a Comment

Using Python in R in Excel

Adam Gladstone wraps up a series on an R add-in for Excel:

In the last post in this series I am going to look at calling Python from R. Even though Excel now provides a means of calling Python scripts directly, using the =PY() formula in a worksheet, there are still occasions when it is beneficial to call Python via R. For example, it turns out that importing yfinance produces a ‘module not found’ error using Excel’s function. According to the documentation, yfinance is not one of the open source libraries that the Excel Python secure distribution supports. To get around this issue, we can use the R package Reticulate. This lets us load and run Python scripts from R. As we have seen in the previous parts of this series, the ExcelRAddIn allows us to run R scripts from an Excel worksheet. And putting these two together is quite simple.

I’m glad Adam mentioned this because my first question was going to be, why use this when Excel has Python capabilities built in? And that’s a reasonable answer.

Leave a Comment

Using R for Forecasting in Excel

Adam Gladstone continues a series on using R in Excel:

We have already seen how to obtain descriptive statistics in Part I and how to use lm() in Part II. In this part (Part III) of the series we will look at using R in Excel to perform forecasting and time series analysis.

In the previous two parts we have seen different ways to handle the output from R function calls, unpacking and massaging the data as required. In this part we are going to focus on setting up and interacting with a number of models in the ‘forecast’ package (fpp2).

Read on for the demo. This is getting into territory that is by no means trivial to do natively in Excel.

Leave a Comment

Linear Regression with R in Excel

Adam Gladstone continues a series on working with R in Excel via the ExcelRAddIn component:

In the first part of this series, I looked at using R in Excel to obtain descriptive statistics. In this second part of the series I am going to look at using R in Excel to perform linear regression, specifically using the lm() functionlm() is a real workhorse function. It can be used to carry out both single and multiple regression and different types of analysis of variance. For this demonstration I will only focus on single and multiple regression.

The workbook for this part of the series is: “Part II – R in Excel – Linear Regression.xlsx”. As before, the ‘References’ worksheet lists links to external references. The ‘Libraries’ worksheet loads additional (non-default) packages. In this demonstration, I use the datarium and broom packages. The ‘Datasets’ worksheet contains the data referenced in the worksheets.

Click through to see how you can perform ordinary least squares regression, multiple linear regression, and even logistic regression in Excel with a bit of R code. H/T R-Bloggers.

Leave a Comment

Using R for Descriptive Statistics in Excel

Adam Gladstone shows off an Excel add-in:

The purpose of this series of posts is to demonstrate some use-cases for R in Excel using the ExcelRAddIn component (disclaimer: I am the developer of this add-in: ExcelRAddIn). The fundamental rationale for the add-in is that it allows access to the extensive R ecosystem within an Excel worksheet. Excel provides many excellent facilities for data wrangling and analysis. However, for certain types of statistical data analysis, the limitations of the built-in functions even alongside the Analysis ToolPak is not sufficient, and R provides superior facilities (for example, for performing LDA, PCA, forecasting and time series analysis to mention a few).

Click through for examples of how it all works. H/T R-Bloggers.

Leave a Comment

Testing Plumber APIs from R

Jakub Sobolewski builds some tests:

When building Plumber APIs in R, effective testing is crucial for ensuring reliability and maintainability.

This guide explores a proven pattern for testing own Plumber APIs that maintains fast feedback loops while providing robust coverage of both business logic and API contracts.

Read on for some good advice on API testing in general, followed by its application in R. Jakub also has a link to a GitHub repo showing a concrete example of how this testing can work. H/T R-Bloggers.

Leave a Comment

The Through-the-Door Problem in Credit Risk Modeling

Richard Vale takes us through a data challenge:

In credit risk modelling, you want to calculate the probability that a loan will default. Since different financial institutions gather different data and offer different products, there is no one-size-fits-all approach to doing this. Therefore, credit risk models are usually built using the institution’s own data. For example, if I’m building a credit risk model for XYZ Bank, I look at loans which XYZ bank has previously granted, and try to estimate the probability that a future loan will default based on principal, tenor, the borrower’s credit rating, and so on.

For those who haven’t heard of the through-the-door problem before, this is a good moment to pause and think about what is wrong with this. Why does this process contain a huge pitfall?

Click through for the answer, as well as an example of the problem and one way to get around this. H/T R-Bloggers.

Comments closed

Spatial Cross-Validation in R

Jakub Nowosad wraps up a series:

This document provides an overview of two R packages, sperrorest and blockCV, that can be used for spatial cross validation, but are outside of standard machine learning frameworks like carettidymodels, or mlr3.

All of the examples below use the same dataset, which includes the temperature measurements in Spain, a set of covariates, and the spatial coordinates of the temperature measurements.

Click through for a pair of cross-validation packages, as well as a link to the rest of the series. H/T R-Bloggers.

Comments closed

Bioconductor in the Wake of ggplot2 4.0.0

Maria Doyle lays it out:

A major update to ggplot2 (version 4.0.0) is expected around mid-to-late July 2025. It brings a significant internal change, replacing most of the S3 backend with the newer S7 object system. While this improves long-term maintainability and extensibility, it may break Bioconductor packages that depend on ggplot2, especially those that customise how plots are built or styled. Packages that use ggplot2 for typical plotting tasks, such as creating plots with ggplot() and geom_*(), are unlikely to be affected.

Click through for notes, tips on what to do, and whether the code you’re using will break with ggplot2 4.0.0. H/T R-Bloggers.

Comments closed

Leaving out an Element in R

Kieran Healy shows how to get all but one element from a vector in R:

Sometimes we want to repeatedly do things with all but one row of a data frame, where we systematically drop each row in turn and do the thing. For example, jacknife cross-validation is a kind of very basic bootstrap technique. (It is computationally simpler than and predates the bootstrap.) Or in some areas “leave-one-out” summary statistics are often calculated as a quick robustness check. Sometimes we want to do this within groups, perhaps especially when the groups may be small.

Read on to see how, including several examples. H/T R-Bloggers.

Comments closed