Press "Enter" to skip to content

Category: R

Creating Prediction Intervals in R

Steven Sanderson builds a prediction interval:

Prediction intervals are a powerful tool for understanding the uncertainty of your predictions. They allow you to specify a range of values within which you are confident that the true value will fall. This can be useful for many tasks, such as setting realistic goals, making informed decisions, and communicating your findings to others.

In this blog post, we will show you how to create a prediction interval in R using the mtcars dataset. The mtcars dataset is a built-in dataset in R that contains information about fuel economy, weight, displacement, and other characteristics of 32 cars.

Click through to see an example based on linear regression.

Comments closed

Comparing Permutation SHAP and Kernel SHAP

Michael Mayer lays some groundwork:

SHAP is the predominant way to interpret black-box ML models, especially for tree-based models with the blazingly fast TreeSHAP algorithm.

For general models, two slower SHAP algorithms exist:

  1. Permutation SHAP (Štrumbelj and Kononenko, 2010)
  2. Kernel SHAP (Lundberg and Lee, 2017)

Read on to understand more about these two forms of SHAP, as well as how they compare in two datasets of differing levels of difficulty.

Comments closed

Simulating a Bivariate Normal Distribution in R

Steven Sanderson isn’t content with a univariate normal distribution:

Imagine two variables, like height and weight, that exhibit a joint distribution. The bivariate normal distribution captures the relationship between these variables, describing how their values tend to cluster around certain means and how they vary together. It’s like a two-dimensional bell curve, where the peak represents the most likely combination of values for both variables.

Click through to learn a bit more about bivariate normal distributions, including ways to plot one and show its density function.

Comments closed

Functions to Deal with Object Names in R

Maelle Salmon has a sticky note:

Interestingly the docs for setNames() sound as if it were created just for this use case!

“This is a convenience function that sets the names on an object and returns the object. It is most useful at the end of a function definition where one is creating the object to be returned and would prefer not to store it under a name just so the names can be assigned.”

For the opposite operation, removing the names of an object, we can use unname().

Read on for an overview of several of these functions. H/T R-Bloggers.

Comments closed

Plotting a Cumulative Distribution Function in R

Steven Sanderson builds a plot:

Before delving into the world of R programming, let’s first grasp the fundamental concept of a CDF. Imagine a group of students eagerly awaiting their exam results. The CDF for their scores would depict the probability of encountering a student with a score less than or equal to a specific value. For instance, if the CDF indicates a value of 0.7 at 80%, it implies that there’s a 70% chance of finding a student with a score of 80 or lower.

Read on to see how you can calculate this in a dataset and then plot the CDF.

Comments closed

Converting Data to Time Series in R with TidyDensity

Steven Sanderson shows off a new function:

If you’re an R enthusiast like me, you know that data manipulation is at the core of everything we do. The ability to transform your data swiftly and efficiently can make or break your data analysis projects. That’s why I’m thrilled to introduce a game-changing function in TidyDensity, my very own R library. Say hello to convert_to_ts()!

In the world of data analysis, time series data is like a treasure chest of insights waiting to be unlocked. Whether you’re tracking stock prices, monitoring patient data, or analyzing the temperature over the years, having your data in a time series format is a crucial step in the process. With convert_to_ts(), that process just got a whole lot easier.

Click through to see how it works and what you can do with it.

Comments closed

Print Debugging in R and Rust

Jonathan Carroll leaves breadcrumbs:

Print debugging has its place. Sure, it’s not always the best way to debug something, but it can often be the fastest. In this post I describe a useful way to do this in Rust and how we can get similar behaviour in R.

Read on to see how the dbg! macro works in Rust and how to emulate it in R, both by hand and using a library called icecream. H/T R-Bloggers.

Comments closed

Reasons to Prefer apply() over For Loops in R

Hugo Gruson draws some comparisons:

The debate regarding the use of for loops versus the apply() function family (apply()lapply()vapply(), etc., along with their purrr counterparts: map()map2()map_lgl()map_chr(), etc.), has been a longstanding one in the R community.

While you may occasionally hear that for loops are slower, this notion has already been debunked in other posts. When utilized correctly, a for loop can achieve performance on par with apply() functions.

However, there are still lesser-known reasons to prefer apply() functions over for loops, which we will explore in this post.

Read on for an important caveat, and then several reasons to prefer apply() (or purrr’s counterparts). H/T R-Bloggers.

Comments closed

From Probabilities to Odds

Bryan Shalloway explains how odds and probabilities intertwine:

However human understanding of odds predates our formal understanding of probability. You can find references to odds dating back to Shakespeare:

Knew that we ventured on such dangerous seas
That if we wrought out life ’twas ten to one;
– Shakespeare’s Henry IV, Part II, 1597

Yet, in most common settings, modern society has largely supplanted odds for probabilities. You can imagine if Shakespeare were writing today the line might end “’twas ten out of eleven.

Read the whole thing.

Comments closed