Press "Enter" to skip to content

Category: R

The Through-the-Door Problem in Credit Risk Modeling

Richard Vale takes us through a data challenge:

In credit risk modelling, you want to calculate the probability that a loan will default. Since different financial institutions gather different data and offer different products, there is no one-size-fits-all approach to doing this. Therefore, credit risk models are usually built using the institution’s own data. For example, if I’m building a credit risk model for XYZ Bank, I look at loans which XYZ bank has previously granted, and try to estimate the probability that a future loan will default based on principal, tenor, the borrower’s credit rating, and so on.

For those who haven’t heard of the through-the-door problem before, this is a good moment to pause and think about what is wrong with this. Why does this process contain a huge pitfall?

Click through for the answer, as well as an example of the problem and one way to get around this. H/T R-Bloggers.

Leave a Comment

Spatial Cross-Validation in R

Jakub Nowosad wraps up a series:

This document provides an overview of two R packages, sperrorest and blockCV, that can be used for spatial cross validation, but are outside of standard machine learning frameworks like carettidymodels, or mlr3.

All of the examples below use the same dataset, which includes the temperature measurements in Spain, a set of covariates, and the spatial coordinates of the temperature measurements.

Click through for a pair of cross-validation packages, as well as a link to the rest of the series. H/T R-Bloggers.

Leave a Comment

Bioconductor in the Wake of ggplot2 4.0.0

Maria Doyle lays it out:

A major update to ggplot2 (version 4.0.0) is expected around mid-to-late July 2025. It brings a significant internal change, replacing most of the S3 backend with the newer S7 object system. While this improves long-term maintainability and extensibility, it may break Bioconductor packages that depend on ggplot2, especially those that customise how plots are built or styled. Packages that use ggplot2 for typical plotting tasks, such as creating plots with ggplot() and geom_*(), are unlikely to be affected.

Click through for notes, tips on what to do, and whether the code you’re using will break with ggplot2 4.0.0. H/T R-Bloggers.

Leave a Comment

Leaving out an Element in R

Kieran Healy shows how to get all but one element from a vector in R:

Sometimes we want to repeatedly do things with all but one row of a data frame, where we systematically drop each row in turn and do the thing. For example, jacknife cross-validation is a kind of very basic bootstrap technique. (It is computationally simpler than and predates the bootstrap.) Or in some areas “leave-one-out” summary statistics are often calculated as a quick robustness check. Sometimes we want to do this within groups, perhaps especially when the groups may be small.

Read on to see how, including several examples. H/T R-Bloggers.

Leave a Comment

Testing Shiny Applications

Arthur Breant runs some tests:

You’ve created a fantastic mockup and your client is delighted. You’re ready to move to production with your application. But one question haunts you: how can you ensure that your application will remain stable and functional through modifications and evolutions?

The answer comes down to one word: testing.

Read on to learn how you can perform unit testing, integration testing, and end-to-end testing of Shiny applications in R. H/T R-Bloggers.

Comments closed

Self-Intersecting Quadrilaterals in R

Jerry Tuttle talks shapes:

A quadrilateral is a polygon having four sides, four angles, and four vertices. A polygon means that the figure is a closed shape, meaning the last line segment connects back to the first one, effectively enclosing an area.

We usually think of quadrilaterals as squares, rectangles, parallelograms, trapezoids, rhombuses, or kites. (I was impressed that my four year-old granddaughter knew the last one, although she called it a diamond!) It could also be irregularly shaped with no name.

However, a polygon may intersect itself. 

Click through for a demonstration of a self-intersecting quadrilateral, including the R code you can use to try it out yourself.

Comments closed

Building Custom PowerPoint Decks in R

Theo Roe tries out a package:

From a purely design perspective, Quarto’s standard PowerPoint output falls short. It is limited to seven layout options, with the most complex being “Two Content.” The {officer} R package offers a powerful alternative for those seeking full control and customisation.

Click through to see how it works, as well as a hit list of limitations you might run into along the way.

Comments closed

Preventing Injection Attacks in Shiny

Arthur Breant shares some advice:

Code injection is a common security vulnerability that involves injecting malicious code into a page or application. This code is then executed, creating the security breach. There are several ways to inject code into an application, and Shiny is unfortunately not immune to these risks.

Click through for a quick overview of the three most common types of injection attack. There’s nothing special about Shiny here—any system that executes code based on user input is potentially vulnerable to injection attacks—so it is good to keep these tips in mind. H/T R-Bloggers.

Comments closed

The Spurious Correlations R Package

Mauricio Vargas S. shows correlation:

spuriouscorrelations package started as a fun project for one of my tutorials.

Here is a case of an interesting correlation: the number of people who drowned by falling into a pool and the number of films Nicholas Cage appeared in.

Click through for examples and how to use the package. If you’re interested in more of these, Tyler Vigen’s website has plenty, and he even wrote a book. H/T R-Bloggers.

Comments closed