Interpreting Kernel SHAP

Michael Mayer digs into Kernel SHAP:

In their 2017 paper on SHAP, Scott Lundberg and Su-In Lee presented Kernel SHAP, an algorithm to calculate SHAP values for any model with numeric predictions. Compared to Monte-Carlo sampling (e.g. implemented in R package “fastshap”), Kernel SHAP is much more efficient.

I had one problem with Kernel SHAP: I never really understood how it works!

Needless to say, Michael knows Kernel SHAP a lot better now, considering there’s now a kernelshap package for us.

Calculating the Hurst Exponent in R

Sang-Heon Lee does some analysis:

Pairs trading literature use the Hurst exponent frequently since it gives an simple and intuitive indicator for the behavior of stock returns. Using S&P 500 returns, let’s learn how to estimate it using R code manually and then use R package conveniently.

Click through for those two examples, as well as a more detailed explanation of the math driving this. H/T R-Bloggers.

Solving the Traveling Salesman Problem in R

Tomaz Kastrun gives us a solution to the Traveling Salesman Problem:

Travelling Salesman Problem is an NP-complete problem and an old mathematical problem. For this useless function, we will look for the nearest city from the previous city (or starting point) and repeat until we visit all cities. The greedy solution is fairly simplified but one disadvantage; it might not give you the best path (optimal solution) and proving that the solution is correct is an additional issue 

As Tomaz notes, this is not guaranteed to be the best solution, just a solution. Considering that TSP is NP-hard, if Tomaz did have a globally optimal solution for us, he certainly wouldn’t be calling it ‘useless-useful’ but instead would be calling it “My prize-winning algorithm.”

Custom Infix Functions in R

Dominik Rafacz loves infix functions:

Custom infix functions are one of my favorite features in R. This article is my love letter to them. But first, a quick recap.

For those unfamiliar with the terminology, infix function is a function fun which is called using infix notation, e.g., x fun y instead of fun(x, y). Those functions are also called infix operators by base R, and I will use those terms and name infixes interchangeably. There are a lot of infix operators in base R used very frequently, i.e., arithmetic or logical operators. We use them so often that we usually forget that they are functions. And that we can call them just like regular functions.

Infix functions are something I tend to forget entirely about developing on my own but they can be extremely useful, as Dominik shows. H/T R-Bloggers.

What’s in a Name?

Benjamin Smith analyzes a name change:

Recently, RStudio announced its name change to Posit. For many this name change was accepted with open arms, but for some-not so. Being the statistician that I am I decided to post a poll on LinkedIn to see the sentiment of my network. After running the poll for a week the results were in:

Read on for the responses as well as an analysis using RSTAN.

Hosting an App on RStudio Connect

Liam Kalita wraps up a series:

So far, we have seen how to create an app using ReactJS and and a Plumber API. In part 3, we will show you how to host the application on RStudio Connect (RSC)!

When it comes to hosting the application on RSC we will set the content URL for both the app and API so that they are in the same domain and won’t have this CORS issue.

Read the whole thing.

Recreating a Shiny App with Plumber and React

Liam Kalita continues a series:

We’ll assume you have a basic understanding of HTML and JavaScript, but you should be able to follow along with a basic programming background. Having a little knowledge of Linux shell commands would be beneficial for some of the terminal commands for generating directories, but you can also do most of it in VSCode using the user interface instead.

Let’s attempt an exercise in creating a small React+Plumber app; this will be very similar to a previous blog post recreating this tutorial {shiny} application using Python Flask.

Click through to see how to build the app. The final part of the series will show how to host the app.

Extracting Numbers from a Stacked Density Plot

Derek Jones digs into an image:

A month or so ago, I found a graph showing a percentage of PCs having a given range of memory installed, between March 2000 and April 2020, on a TechTalk page of PC Matic; it had the form of a stacked density plot. This kind of installed memory data is rare, how could I get the underlying values (a previous post covers extracting data from a heatmap)?

Read on for an interesting attempt at reverse-engineering the original numbers used to create an image. H/T R-Bloggers.

Mapping Income vs Rent in Counties

Rick Pack updates a package to support a project:

I am happy to announce a contribution to the biscale package that makes printing shorter labels using SI prefixes (e.g., 1,000,003 => 1M and 1,324 => 1.3k) far easier. This makes printing the legend in an attractive easier, although you can tell by the picture above that I still struggle with optimal uses of the cowplot package’s draw_plot(). I would love for the legend and map to be centered under the title.

The new si_levels argument for bi_class_breaks() takes a logical value of TRUE or FALSE for either a single or two-unit vector, with a single unit vector causing the specified value to be applied to both the X and Y variables. This matches Prener’s convenient functionality for the number of digits function dig_lab, as he requested in the Github Issue I created for this addition. Note that si_levels rounds the input number, if appropriate, based on the digits indicated by dig_lab, which defaults to 3.

Click through to get access to the update, as well as to see some of the visuals Rick put together with it.

Tips for the Tidyverse

Tomaz Kastrun shares some advice:

Tidyverse provides a handful of great functions for operating across multiple columns simultaneously. Across is a function, that makes it easy to apply the same transformation over numerous columns in summarise() and mutate() functions.

Across accepts two arguments; a) array of columns and b) function or list of functions to be applied to selected columns.

Check out eight tips for working with packages in R’s tidyverse.

