Press "Enter" to skip to content

Category: R

Creating Error Bars in ggplot2

Zhenguo Zhang draws a chart:

Sometimes you may want to create a plot with the following features:

  • a point to indicate the mean of a group
  • error bars to indicate the standard deviation of the group
  • and each group may have subgroups, which are represented by different colors.

In this post, I will show you how to create such a plot using the ggplot2 package in R.

Read on for the demonstration, as well as fixing a common problem of overlapping data points. H/T R-Bloggers.

Leave a Comment

Function Generators versus Partial Application in R

Jonathan Carroll digs in:

The blog post (www.tidyverse.org) describing the latest updates to the tidyverse {scales} package neatly demonstrates the usage of the new functionality, but because the examples are written outside of actual plotting code, one feature stuck out to me in particular…

label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))
# The Gentoo penguin
# The Chinstrap penguin
# The Adelie penguin

Read on for a dive into what makes the actual invocation interesting. H/T R-Bloggers.

Leave a Comment

choroplethr 4.0.0 Now in CRAN

Ari Lamstein has an announcement:

With this version, I have transferred the maintenance of choroplethr to Zhaochen He, an economics professor at Christopher Newport University. Zhao addressed the issues that led to choroplethr being archived from CRAN in February. Please join me in thanking Zhao for his contribution!

Click through for the updates, as well as what Ari views as the current challenges for the project as he hands the project over Zhaochen He. H/T R-Bloggers.

Leave a Comment

Data Splitting and Cross-Validation in R

Nick Han has a pair of articles. First up is on data splitting and pre-processing:

Data preprocessing is a crucial step in any machine learning workflow. It ensures that your data is clean, consistent, and ready for modeling. In this blog post, we’ll walk through the process of splitting and preprocessing data in R, using the rsample package for data splitting and saving the results for future use.

H/T R-Bloggers for that one.

The second involves using cross-validation via the caret package in R:

Cross-validation is a resampling technique used to assess the performance and generalizability of machine learning models. It helps address issues like overfitting and ensures that the model’s performance is consistent across different subsets of the data. By splitting the data into multiple folds and repeating the process, cross-validation provides a robust estimate of model performance.

H/T R-Bloggers for that as well.

Leave a Comment

What’s New in R 4.5.0

Russ Hyde checks out the changes:

R 4.5.0 (“How About a Twenty-Six”) was released on 11th April, 2025. Here we summarise some of the interesting changes that have been introduced. In previous blog posts we have discussed the new features introduced in R 4.4.0 and earlier versions (see the links at the end of this post).

The full changelog can be found at the r-release ‘NEWS’ page and if you want to keep up to date with developments in base R, have a look at the r-devel ‘NEWS’ page.

There are some nice bits of functionality on the list, so check it out.

Leave a Comment

Avoid aggregate in R on Wide Matrices

Ali Oghabian shares some hard-earned advice:

The aggregate function can be very useful in R, allowing one to run a function (e.g. mean) within groups of rows, in each column in a matrix/data-frame and organize the results in an easy-to-read table. However, the function takes long to run for very wide matrices and data frames, where the number of the columns are large. I this post I demonstrate the issue and show a couple of nice solutions that at least for the example cuts down the time to 15% and even less, compared to the run-time of the aggregate function.

Click through for a demo. Granted, this is a matrix with 10,000 columns, so I’m not sure how this applies to narrower matrices. H/T R-Bloggers.

Comments closed

Reactable Tables with Sparklines in Shiny Apps

Osheen MacOscar continues a series:

This is the third blog in a series about the {sparkline} R package for inline data visualisations. You can read the first one about getting started with the package here and the second one about embedding them in HTML tables with the {reactable} package here.

In this blog I am taking it a step further and demonstrating how to use our sparkline reactable table in a Shiny app. Thankfully {reactable} has some helpful functions that make this super easy! I will also look at using a dynamic traffic light image in a reactable table at the end.

Click through to see how it all works.

Comments closed

Converting a CSV to Parquet with DuckDB and Polars in R

Michael Mayer makes a swap:

In this recent post, we have used Polars and DuckDB to convert a large CSV file to Parquet in steaming mode – and Python.

Different people have contacted me and asked: “and in R?”

Simple answer: We have DuckDB, and we have different Polars bindings. Here, we are using {polars} which is currently being overhauled into {neopandas}.

Click through for the comparison.

Comments closed

Building a Package in R

Ken Koon Wong builds a package:

Made a hilariously redundant R package for a simple OpenAI calls, but the real win was finally learning how to build an R package! Is it efficient? Absolutely not! Was it worth the time and experience? Yes! Will I do it again? Yes! Will it break? Yes! 

Package management in R is a bit of a challenge, but well worth it, even if the only consumers of your code will be in your own company. H/T R-Bloggers.

Comments closed