R – Page 2 – Curated SQL

Extending caret for Spatial Machine Learning

Published 2025-05-15 by Kevin Feasel

This document shows the application of caret for spatial modelling at the example of predicting air temperature in Spain. Hereby, we use measurements of air temperature available only at specific locations in Spain to create a spatially continuous map of air temperature. Therefore, machine-learning models are trained to learn the relationship between spatially continuous predictors and air temperature.

When using machine-learning methods with spatial data, we need to take care of, e.g., spatial autocorrelation, as well as extrapolation when predicting to regions that are far away from the training data. To deal with these issues, several methods have been developed. In this document, we will show how to combine the machine-learning workflow of caret with packages designed to deal with machine-learning with spatial data. Hereby, we use blockCV::cv_spatial() and CAST::knndm() for spatial cross-validation, and CAST::aoa() to mask areas of extrapolation. We use sf and terra for processing vector and raster data, respectively.

Click through to see how it all works. H/T R-Bloggers.

Comments closed

Using Multiple Scales with ggplot2 and ggnewscale

Published 2025-05-12 by Kevin Feasel

Zhenguo Zhang resets the scale:

In one ggplot figure, normally you can only use one scale for each aesthetic mapping. For example, if you use scale_color_manual() to set the color scale for a layer, you cannot use another scale_color_manual() for another layer, or set the color scale more then once in the function aes(). However, you can use the new_scale_color() function from the ggnewscale package to add a new scale for the same aesthetic mapping in different layers.

In this post, I will showcase how to use the new_scale_color() function to add two different color scales in a ggplot figure. The first scale will be for a discrete variable (e.g., number of cylinders), and the second scale will be for a continuous variable (e.g., density level).

Click through for the code and a demonstration.

Comments closed

Debugging R Code in Visual Studio Code

Published 2025-05-05 by Kevin Feasel

Yohann Mansiaux steps through the code:

We are going to see how to use these functions in VSCode, as well as introducing “breakpoints”. Breakpoints are markers that you can set in your code to pause execution at a specific line. This allows you to inspect the state of your code at that point and step through it line by line. They are very close to the browser() function. They can also be used in RStudio IDE, but I have to admit that I never used them.

Read on to see how VSCode fills the need when it comes to debugging code. H/T R-Bloggers.

Comments closed

Model Diagnostics for Statistics vs Machine Learning

Published 2025-05-02 by Kevin Feasel

Christian Lorentzen talks diagnostics:

In this post, we show how different use cases require different model diagnostics. In short, we compare (statistical) inference and prediction.

As an example, we use a simple linear model for the Munich rent index dataset, which was kindly provided by the authors of Regression – Models, Methods and Applications 2nd ed. (2021). This dataset contains monthy rents in EUR (rent) for about 3000 apartments in Munich, Germany, from 1999.

Read on to learn more about this dataset and how the mindset differs if you’re thinking about inference versus prediction.

Comments closed

Breaking down the Limitations of R^2

Published 2025-05-02 by Kevin Feasel

M. Fatih Tüzen explains an important regression concept:

When building a statistical model, one of the first numbers analysts and data scientists often cite is the R², or coefficient of determination. It’s widely reported in research, academic theses, and industry reports — and yet, frequently misunderstood or misused.

Does a high R² mean your model is good? Is it enough to evaluate model performance? What about its adjusted or predictive counterparts?

Read on to learn the answers to each question. H/T R-Bloggers.

Comments closed

Calculating Grouped Counts and Weighted Means in R

Published 2025-05-01 by Kevin Feasel

Michael Mayer doesn’t want to wait:

From time to time, the following questions pop up:

How to calculate grouped counts and (weighted) means?

What are fast ways to do it in R?

This blog post presents a couple of approaches and then compares their speed with a naive benchmark.

Read on for multiple options and then a quick performance comparison.

Comments closed

Creating Error Bars in ggplot2

Published 2025-04-29 by Kevin Feasel

Zhenguo Zhang draws a chart:

Sometimes you may want to create a plot with the following features:

a point to indicate the mean of a group

error bars to indicate the standard deviation of the group

and each group may have subgroups, which are represented by different colors.

In this post, I will show you how to create such a plot using the ggplot2 package in R.

Read on for the demonstration, as well as fixing a common problem of overlapping data points. H/T R-Bloggers.

Comments closed

Function Generators versus Partial Application in R

Published 2025-04-25 by Kevin Feasel

Jonathan Carroll digs in:

The blog post (www.tidyverse.org) describing the latest updates to the tidyverse {scales} package neatly demonstrates the usage of the new functionality, but because the examples are written outside of actual plotting code, one feature stuck out to me in particular…
label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))
# The Gentoo penguin
# The Chinstrap penguin
# The Adelie penguin

Read on for a dive into what makes the actual invocation interesting. H/T R-Bloggers.

Comments closed

choroplethr 4.0.0 Now in CRAN

Published 2025-04-15 by Kevin Feasel

Ari Lamstein has an announcement:

With this version, I have transferred the maintenance of choroplethr to Zhaochen He, an economics professor at Christopher Newport University. Zhao addressed the issues that led to choroplethr being archived from CRAN in February. Please join me in thanking Zhao for his contribution!

Click through for the updates, as well as what Ari views as the current challenges for the project as he hands the project over Zhaochen He. H/T R-Bloggers.

Comments closed

Data Splitting and Cross-Validation in R

Published 2025-04-14 by Kevin Feasel

Nick Han has a pair of articles. First up is on data splitting and pre-processing:

Data preprocessing is a crucial step in any machine learning workflow. It ensures that your data is clean, consistent, and ready for modeling. In this blog post, we’ll walk through the process of splitting and preprocessing data in R, using the rsample package for data splitting and saving the results for future use.

H/T R-Bloggers for that one.

The second involves using cross-validation via the caret package in R:

Cross-validation is a resampling technique used to assess the performance and generalizability of machine learning models. It helps address issues like overfitting and ensures that the model’s performance is consistent across different subsets of the data. By splitting the data into multiple folds and repeating the process, cross-validation provides a robust estimate of model performance.

H/T R-Bloggers for that as well.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: R