Press "Enter" to skip to content

Category: R

Matrix Multiplication in R with DuckDB and SQLite

Karsten Weinert compares two databases:

On my laptop with 16 GB RAM, I would like to perform a matrix-vector multiplication with a sparse matrix of around 10 million columns and 2500 rows. The matrix has approximately only 2% non-zero entries, but this are still 500 million numbers and the column/row information, too large to work comfortably in-memory.

A while ago, I tried using sqlite for this task. It kind of worked, but was too slow to be useful. This weekend, I revisited the problem and tried using duckdb.

Read on for the results. I’ve heard enough positives about DuckDB over the past few weeks that it makes me want to try it out. H/T R-Bloggers.

Comments closed

tidyAML Now Available in CRAN

Steven Sanderson has a package make the big-time:

I’m excited to announce that the R package {tidyAML} is now officially available on CRAN! This package is designed to make it easy for users to perform automated machine learning (AutoML) using the tidymodels ecosystem. With a simple and intuitive interface, tidyAML allows users to quickly generate high-quality machine learning models without worrying about the underlying details.

Read on to learn more more about this package, as well as the broader healthyverse series of packages.

Comments closed

Visualizing Moving Averages in R with healthyR.ts

Steven Sanderson shows off a useful R library:

Are you interested in visualizing time series data in a clear and concise way? The R package {healthyR.ts} provides a variety of tools for time series analysis and visualization, including the ts_ma_plot() function.

The ts_ma_plot() function is designed to help you quickly and easily create moving average plots for time series data. This function takes several arguments, including the data you want to visualize, the date column from your data, the value column from your data, and the frequency of the aggregation.

Read on to learn more about this plot and see an example of it in action.

Comments closed

Making Star Maps in R

Benjamin Smith builds a map:

Continuing my explorations in developing custom map art, I decided to take a detour from developing the mapBliss package to explore another type of map which is very popular in the map-art space- star and constellation maps! This initially started out as an issue opened on the mapBliss Github. However, I quickly realized the framework required for making star maps is completely different from making regular maps for custom fight paths and road trips.

Read on to learn more about the problem and what libraries are available to help in R.

Comments closed

Cumulative Measurement Functions in R

Steven P. Sanderson provides an overview of an R package:

If you’re looking for an easy-to-use package to calculate cumulative statistics in R, you may want to check out the TidyDensity package. This package offers several functions to calculate cumulative measurements, including mean, median, standard deviation, variance, skewness, kurtosis, harmonic mean, and geometric mean.

Click through for a quick description of each function, as well as examples of each. H/T R-Bloggers.

Comments closed

Diagnosing VSCode + R Integration Problems

John MacKintosh troubleshoots an issue:

I recently updated my R installation, and then realised that I’d broken my VSCode/ R set up in the process – I could not launch an R terminal either directly or via radian.

I have a repo where I’ve collated various blog posts relating to setting up VSCode for R, but that didn’t solve all my problems.

I did get it resolved eventually, and here’s how.

Read on for that resolution. H/T R-Bloggers.

Comments closed

Organizing R Code

Tomaz Kastrun tidies up:

Keeping your R code organised is not as straightforward as one might think. Just think about the libraries, variables, functions, and many more. All these objects can be defined and later rewritten, some might get obsolete during the process.

This process is proven to be even more crucial when you are part of a larger group of engineers, and scientists, who collaborate with you.

Click through for some organizational tips specific to R code.

Comments closed

The Importance of Re-Learning for Knowledge Updates

Neil Saunders thinks about learning:

Some years ago I read an article – I forget where – describing how our general knowledge often becomes frozen in time. Asked to name the tallest building in the world you confidently proclaim “the Sears Tower!”, because for most of your childhood that was the case – never mind that the record was surpassed long ago and it isn’t even called the Sears Tower anymore. From memory the example in the article was of a middle-aged speaker who constantly referred to a figure of 4 billion for the human population – again, because that’s what he learned in school and had never mentally updated.

Is this the case with programming too? Oh yes – as I learned today when performing the simplest of tasks: reading CSV files using R.

The specific task involved ways to read a list of CSV files in R, though the impetus behind the post is ways to keep that knowledge up to date. This is one reason why it can be useful to attend introductory-level sessions on topics you already know: there might be new things in recent versions of software which change the game. There are also times when you learn something en passant: in a talk (or blog post or video) about topic X, the author might casually use some technique or tool not related to the topic itself.

Comments closed

Thoughts on Code Commenting

Maelle Salmon has comments:

An important goal of good code is to be readable so that future contributors can build with and upon it as needed. Good commenting is part of the toolset for reaching that goal. In this post we shall first present principles of code commenting, and then a few tips.

I agree with the general thrust of Maelle’s argument. “What and how” types of comments are fine for pseudo-code that you write before beginning the real work, but they’re scaffolding and shouldn’t stick around when the code is done. Instead, focus on the “why.”

One area of focus I’d bring in terms of how I view comments is that I will have (and like to see) more detailed comments in the most difficult sections of code. Yeah, if you can simplify the code, that’s better than adding a lengthy comment. But there’s always some bit of code which five people have tried to simplify over the years but it doesn’t work. Knowing what the business rules are, what you’ve unsuccessfully tried in the past, and why this is the best available option (as of the time of the last update) can help prevent developer six from tilting at windmills.

Comments closed