Press "Enter" to skip to content

Category: R

Sampling without Replacement and Unequal Probabilities

Peter Ellis finds interesting results with sampling in R:

A week ago I was surprised to read on Thomas Lumley’s Biased and Inefficient blog that when using R’s sample() function without replacement and with unequal probabilities of individual units being sampled:

“What R currently has is sequential sampling: if you give it a set of priorities w it will sample an element with probability proportional to w from the population, remove it from the population, then sample with probability proportional to w from the remaining elements, and so on. This is useful, but a lot of people don’t realise that the probability of element i being sampled is not proportional to w_i”

Read on for a demonstration. H/T R-Bloggers.

Comments closed

Explaining a Causal Forest

Michael Mayer wants to suss out the effects of inputs into a causal forest model:

We use a causal forest [1] to model the treatment effect in a randomized controlled clinical trial. Then, we explain this black-box model with usual explainability tools. These will reveal segments where the treatment works better or worse, just like a forest plot, but multivariately.

Read on for the example, as well as several mechanisms you can use to gauge feature relevance.

Comments closed

Random Forest Missing Data Imputation using missRanger

Michael Mayer handles missing data:

{missRanger} is a multivariate imputation algorithm based on random forests, and a fast version of the original missForest algorithm of Stekhoven and Buehlmann (2012). Surprise, surprise: it uses {ranger} to fit random forests. Especially combined with predictive mean matching (PMM), the imputations are often quite realistic.

This looks like an interesting package. At first, I thought it was a way of generating predictions outside the boundaries of training data and had concerns—a classic point (limitation?) of random forest as an algorithm is that it will not even try to predict values outside the range of what it sees in training data, so if the largest label is 10 and the smallest is 0, you won’t see a prediction of 11 or 50, no matter how you scale the inputs.

Instead of doing that, missRanger looks like it’s filling in missing data using a clever approach. That’s quite useful for dealing with incomplete data, a really common problem whose good solutions tend to be complex enough that people typically ignore them in favor of simple but less useful solutions like dropping rows altogether.

Comments closed

Comparing grep() and grepl() in R

Steven Sanderson compares two functions:

Both grep() and grepl() are functions in R that help us search for patterns in text. Think of them as detectives looking for clues in a big pile of words!

  • grep(): This function is like a pointer. It tells you where it found the pattern you’re looking for.
  • grepl(): This one is more like a yes/no checker. It tells you if the pattern exists or not.

Read on for examples of each.

Comments closed

Searching for Multiple Patterns in R with grepl

Steven Sanderson looks for the pattern:

Hello, fellow useRs! Today, we’re going to expand on previous uses of the grepl() function where we looked for a single pattern and move onto to a search for multiple patterns within strings. Whether you’re cleaning data, conducting text analysis, grepl can be your go-to tool. Let’s break down the syntax, offer a practical example, and guide you on a path to proficiency.

Read on for all of that.

Comments closed

Loops in R

Ben Johnston spins in circles:

Welcome back to my R for SEO series. We’re in the home stretch now, with part seven. Today, we’re going to be looking at different ways that we can run functions or commands over a series of elements using the various kinds of loops that exist in R.

If you’ve followed along so far, or you’ve tried some experimentation of your own, you’ve probably encountered loops and applys along the way. I know early on in my R journey, it very much seemed like pot luck as to which apply I should use, or whether a loop was easier, so hopefully today’s piece will start to clear that up for you a little.

I know that most programming courses cover these elements earlier, but for me, it really didn’t click until I’d learned more about the other areas we’ve covered in this series, so that’s why I’ve placed it here.

Read on for examples of For loops and While loops, as well as breaking conditions.

Ben also talks about loops versus using the apply() series of functions (or equivalent map() functions in the purrr library). I tend to lean heavily on using the mapping function approach when there are no side effects, and use for loops when there are. H/T R-Bloggers.

Comments closed

Analyzing the Game Wingspan

Dan Oehm builds a meta:

Wingspan is a great game even though I’ve only played it a few times. The mechanics are great, there are lots of bird varitions, and a bunch of different strategies to try. There are 170 birds, and I’ve probably only seen 30 of them. So, true to form, I’ve dabbled in a bit of data analysis to get a view of all the different types of cards in the game.

Open source wins again since the {wingspan} R package exists. It contains the details of each bird in the core, European, Oceania, and swift start sets. I’ll only be using the core set for this analysis since that’s the only one I’m semi familiar with.

Having not played the game before, Dan’s visuals drew me in. There’s also a regression analysis and discussion of the trade-off between in-game power versus victory points. H/T R-Bloggers.

Comments closed

String Concatenation of Vectors in R

Steven Sanderson glues together some vectors:

Welcome to another exciting R programming tutorial! Today, we will explore how to concatenate vectors of strings using different methods in R: base R, stringrstringi, and glue. We’ll use a practical example involving a data frame with names, job titles, and salaries. By the end of this post, you’ll feel confident using these tools to manipulate and combine strings in your own projects. Let’s get started!

Read on to see how to do this in several ways.

Comments closed

String Concatenation in R

Steven Sanderson smooshes strings together:

String concatenation is a fundamental operation in data manipulation and cleaning. If you are working in R, mastering string concatenation will significantly enhance your data processing capabilities. This blog post will cover different ways to concatenate strings using base R, the stringrstringi, and glue packages. Let’s go!

Read on for examples using paste(), paste0(), str_c(), stri_c(), and glue().

Comments closed

Counting Character Occurrences in R

Steven Sanderson counts the ways:

Counting the occurrences of a specific character within a string is a common task in data processing and text manipulation. Whether you’re working with base R or leveraging the power of packages like stringr or stringi, R provides efficient ways to accomplish this. In this post, we’ll explore how to do this using three different methods.

Read on for three separate examples.

Comments closed