R – Page 8 – Curated SQL

Random Forest Missing Data Imputation using missRanger

Published 2024-08-26 by Kevin Feasel

{missRanger} is a multivariate imputation algorithm based on random forests, and a fast version of the original missForest algorithm of Stekhoven and Buehlmann (2012). Surprise, surprise: it uses {ranger} to fit random forests. Especially combined with predictive mean matching (PMM), the imputations are often quite realistic.

This looks like an interesting package. At first, I thought it was a way of generating predictions outside the boundaries of training data and had concerns—a classic point (limitation?) of random forest as an algorithm is that it will not even try to predict values outside the range of what it sees in training data, so if the largest label is 10 and the smallest is 0, you won’t see a prediction of 11 or 50, no matter how you scale the inputs.

Instead of doing that, missRanger looks like it’s filling in missing data using a clever approach. That’s quite useful for dealing with incomplete data, a really common problem whose good solutions tend to be complex enough that people typically ignore them in favor of simple but less useful solutions like dropping rows altogether.

Comments closed

Comparing grep() and grepl() in R

Published 2024-08-21 by Kevin Feasel

Steven Sanderson compares two functions:

Both grep() and grepl() are functions in R that help us search for patterns in text. Think of them as detectives looking for clues in a big pile of words!

grep(): This function is like a pointer. It tells you where it found the pattern you’re looking for.

grepl(): This one is more like a yes/no checker. It tells you if the pattern exists or not.

Read on for examples of each.

Comments closed

Searching for Multiple Patterns in R with grepl

Published 2024-08-20 by Kevin Feasel

Steven Sanderson looks for the pattern:

Hello, fellow useRs! Today, we’re going to expand on previous uses of the grepl() function where we looked for a single pattern and move onto to a search for multiple patterns within strings. Whether you’re cleaning data, conducting text analysis, grepl can be your go-to tool. Let’s break down the syntax, offer a practical example, and guide you on a path to proficiency.

Read on for all of that.

Comments closed

Loops in R

Published 2024-08-20 by Kevin Feasel

Ben Johnston spins in circles:

Welcome back to my R for SEO series. We’re in the home stretch now, with part seven. Today, we’re going to be looking at different ways that we can run functions or commands over a series of elements using the various kinds of loops that exist in R.

If you’ve followed along so far, or you’ve tried some experimentation of your own, you’ve probably encountered loops and applys along the way. I know early on in my R journey, it very much seemed like pot luck as to which apply I should use, or whether a loop was easier, so hopefully today’s piece will start to clear that up for you a little.

I know that most programming courses cover these elements earlier, but for me, it really didn’t click until I’d learned more about the other areas we’ve covered in this series, so that’s why I’ve placed it here.

Read on for examples of For loops and While loops, as well as breaking conditions.

Ben also talks about loops versus using the apply() series of functions (or equivalent map() functions in the purrr library). I tend to lean heavily on using the mapping function approach when there are no side effects, and use for loops when there are. H/T R-Bloggers.

Comments closed

Analyzing the Game Wingspan

Published 2024-08-19 by Kevin Feasel

Dan Oehm builds a meta:

Wingspan is a great game even though I’ve only played it a few times. The mechanics are great, there are lots of bird varitions, and a bunch of different strategies to try. There are 170 birds, and I’ve probably only seen 30 of them. So, true to form, I’ve dabbled in a bit of data analysis to get a view of all the different types of cards in the game.

Open source wins again since the {wingspan} R package exists. It contains the details of each bird in the core, European, Oceania, and swift start sets. I’ll only be using the core set for this analysis since that’s the only one I’m semi familiar with.

Having not played the game before, Dan’s visuals drew me in. There’s also a regression analysis and discussion of the trade-off between in-game power versus victory points. H/T R-Bloggers.

Comments closed

String Concatenation of Vectors in R

Published 2024-08-15 by Kevin Feasel

Steven Sanderson glues together some vectors:

Welcome to another exciting R programming tutorial! Today, we will explore how to concatenate vectors of strings using different methods in R: base R, stringr, stringi, and glue. We’ll use a practical example involving a data frame with names, job titles, and salaries. By the end of this post, you’ll feel confident using these tools to manipulate and combine strings in your own projects. Let’s get started!

Read on to see how to do this in several ways.

Comments closed

String Concatenation in R

Published 2024-08-14 by Kevin Feasel

Steven Sanderson smooshes strings together:

String concatenation is a fundamental operation in data manipulation and cleaning. If you are working in R, mastering string concatenation will significantly enhance your data processing capabilities. This blog post will cover different ways to concatenate strings using base R, the stringr, stringi, and glue packages. Let’s go!

Read on for examples using paste(), paste0(), str_c(), stri_c(), and glue().

Comments closed

Counting Character Occurrences in R

Published 2024-08-12 by Kevin Feasel

Steven Sanderson counts the ways:

Counting the occurrences of a specific character within a string is a common task in data processing and text manipulation. Whether you’re working with base R or leveraging the power of packages like stringr or stringi, R provides efficient ways to accomplish this. In this post, we’ll explore how to do this using three different methods.

Read on for three separate examples.

Comments closed

Finding String Patterns in R

Published 2024-08-09 by Kevin Feasel

Steven Sanderson goes looking for patterns:

Welcome to another exciting blog post where we walk into the world of R programming. Today, we’re going to explore how to check if a string contains specific characters using three different approaches: base R, stringr, and stringi. Whether you’re a beginner or an experienced R user, this guide will should be of some use and provide you with some practical examples.

Read on for those three examples.

Comments closed

Systematic Sampling in R

Published 2024-08-06 by Kevin Feasel

Steven Sanderson continues a series on sampling:

In this post, we will explore systematic sampling in R using base R functions. Systematic sampling is a technique where you select every (k^{th}) element from a list or dataset. This method is straightforward and useful when you want a representative sample without the complexity of more advanced sampling techniques.

Let’s dive into an example to understand how it works.

In very technical circles, this is also known as the “eenie-meenie-meiney-moe technique” and is very similar to the “duck-duck-goose” algorithm, though that has an additional stochastic input.

Comments closed

Category: R