Press "Enter" to skip to content

Category: R

choroplethr 4.0.0 Now in CRAN

Ari Lamstein has an announcement:

With this version, I have transferred the maintenance of choroplethr to Zhaochen He, an economics professor at Christopher Newport University. Zhao addressed the issues that led to choroplethr being archived from CRAN in February. Please join me in thanking Zhao for his contribution!

Click through for the updates, as well as what Ari views as the current challenges for the project as he hands the project over Zhaochen He. H/T R-Bloggers.

Comments closed

Data Splitting and Cross-Validation in R

Nick Han has a pair of articles. First up is on data splitting and pre-processing:

Data preprocessing is a crucial step in any machine learning workflow. It ensures that your data is clean, consistent, and ready for modeling. In this blog post, we’ll walk through the process of splitting and preprocessing data in R, using the rsample package for data splitting and saving the results for future use.

H/T R-Bloggers for that one.

The second involves using cross-validation via the caret package in R:

Cross-validation is a resampling technique used to assess the performance and generalizability of machine learning models. It helps address issues like overfitting and ensures that the model’s performance is consistent across different subsets of the data. By splitting the data into multiple folds and repeating the process, cross-validation provides a robust estimate of model performance.

H/T R-Bloggers for that as well.

Comments closed

What’s New in R 4.5.0

Russ Hyde checks out the changes:

R 4.5.0 (“How About a Twenty-Six”) was released on 11th April, 2025. Here we summarise some of the interesting changes that have been introduced. In previous blog posts we have discussed the new features introduced in R 4.4.0 and earlier versions (see the links at the end of this post).

The full changelog can be found at the r-release ‘NEWS’ page and if you want to keep up to date with developments in base R, have a look at the r-devel ‘NEWS’ page.

There are some nice bits of functionality on the list, so check it out.

Comments closed

Avoid aggregate in R on Wide Matrices

Ali Oghabian shares some hard-earned advice:

The aggregate function can be very useful in R, allowing one to run a function (e.g. mean) within groups of rows, in each column in a matrix/data-frame and organize the results in an easy-to-read table. However, the function takes long to run for very wide matrices and data frames, where the number of the columns are large. I this post I demonstrate the issue and show a couple of nice solutions that at least for the example cuts down the time to 15% and even less, compared to the run-time of the aggregate function.

Click through for a demo. Granted, this is a matrix with 10,000 columns, so I’m not sure how this applies to narrower matrices. H/T R-Bloggers.

Comments closed

Reactable Tables with Sparklines in Shiny Apps

Osheen MacOscar continues a series:

This is the third blog in a series about the {sparkline} R package for inline data visualisations. You can read the first one about getting started with the package here and the second one about embedding them in HTML tables with the {reactable} package here.

In this blog I am taking it a step further and demonstrating how to use our sparkline reactable table in a Shiny app. Thankfully {reactable} has some helpful functions that make this super easy! I will also look at using a dynamic traffic light image in a reactable table at the end.

Click through to see how it all works.

Comments closed

Converting a CSV to Parquet with DuckDB and Polars in R

Michael Mayer makes a swap:

In this recent post, we have used Polars and DuckDB to convert a large CSV file to Parquet in steaming mode – and Python.

Different people have contacted me and asked: “and in R?”

Simple answer: We have DuckDB, and we have different Polars bindings. Here, we are using {polars} which is currently being overhauled into {neopandas}.

Click through for the comparison.

Comments closed

Building a Package in R

Ken Koon Wong builds a package:

Made a hilariously redundant R package for a simple OpenAI calls, but the real win was finally learning how to build an R package! Is it efficient? Absolutely not! Was it worth the time and experience? Yes! Will I do it again? Yes! Will it break? Yes! 

Package management in R is a bit of a challenge, but well worth it, even if the only consumers of your code will be in your own company. H/T R-Bloggers.

Comments closed

Orchestrating Data Pipelines in R with maestro

Will Hipson moves some data:

If you look at data orchestration tools today you are bombarded with a dizzying array of software platforms that claim unsurpassed processing capability, AI-readiness, elegant UIs, etc. Apache Airflow is just one example of a popular orchestration platform that scales to meet virtually any orchestration need. And while these claims may be true, I argue it is rarely the case that these gargantuan platforms are needed in the first place. For most data engineers, you probably only need to process a moderate amount of data at a moderate time scale. Moreover, if you’re an R user, you don’t want to have to define your data pipelines using drag-and-drop tools or learn another programming language. Not only will this reduce cloud costs but also development time costs.

Click through to see why Will developed maestro and how it works. H/T R-Bloggers.

Comments closed

Building Sparklines in R Shiny

Osheen MacOscar needs a series of tiny visuals:

All of the plots from this package use the sparkline function, and we pass the type of chart we want as the type argument (default is line). The function will take a vector or list for the values argument, depending on the type of chart we are creating this can be either data to plot or specifications for the plot.

Click through for a list of the sparkline-esque visuals you can create as well as how you can use the sparkline package and some examples of outputs.

Comments closed