Category: R

Analyzing the Game Wingspan

Published 2024-08-19 by Kevin Feasel

Wingspan is a great game even though I’ve only played it a few times. The mechanics are great, there are lots of bird varitions, and a bunch of different strategies to try. There are 170 birds, and I’ve probably only seen 30 of them. So, true to form, I’ve dabbled in a bit of data analysis to get a view of all the different types of cards in the game.

Open source wins again since the {wingspan} R package exists. It contains the details of each bird in the core, European, Oceania, and swift start sets. I’ll only be using the core set for this analysis since that’s the only one I’m semi familiar with.

Having not played the game before, Dan’s visuals drew me in. There’s also a regression analysis and discussion of the trade-off between in-game power versus victory points. H/T R-Bloggers.

Comments closed

String Concatenation of Vectors in R

Published 2024-08-15 by Kevin Feasel

Steven Sanderson glues together some vectors:

Welcome to another exciting R programming tutorial! Today, we will explore how to concatenate vectors of strings using different methods in R: base R, stringr, stringi, and glue. We’ll use a practical example involving a data frame with names, job titles, and salaries. By the end of this post, you’ll feel confident using these tools to manipulate and combine strings in your own projects. Let’s get started!

Read on to see how to do this in several ways.

Comments closed

String Concatenation in R

Published 2024-08-14 by Kevin Feasel

Steven Sanderson smooshes strings together:

String concatenation is a fundamental operation in data manipulation and cleaning. If you are working in R, mastering string concatenation will significantly enhance your data processing capabilities. This blog post will cover different ways to concatenate strings using base R, the stringr, stringi, and glue packages. Let’s go!

Read on for examples using paste(), paste0(), str_c(), stri_c(), and glue().

Comments closed

Counting Character Occurrences in R

Published 2024-08-12 by Kevin Feasel

Steven Sanderson counts the ways:

Counting the occurrences of a specific character within a string is a common task in data processing and text manipulation. Whether you’re working with base R or leveraging the power of packages like stringr or stringi, R provides efficient ways to accomplish this. In this post, we’ll explore how to do this using three different methods.

Read on for three separate examples.

Comments closed

Finding String Patterns in R

Published 2024-08-09 by Kevin Feasel

Steven Sanderson goes looking for patterns:

Welcome to another exciting blog post where we walk into the world of R programming. Today, we’re going to explore how to check if a string contains specific characters using three different approaches: base R, stringr, and stringi. Whether you’re a beginner or an experienced R user, this guide will should be of some use and provide you with some practical examples.

Read on for those three examples.

Comments closed

Systematic Sampling in R

Published 2024-08-06 by Kevin Feasel

Steven Sanderson continues a series on sampling:

In this post, we will explore systematic sampling in R using base R functions. Systematic sampling is a technique where you select every (k^{th}) element from a list or dataset. This method is straightforward and useful when you want a representative sample without the complexity of more advanced sampling techniques.

Let’s dive into an example to understand how it works.

In very technical circles, this is also known as the “eenie-meenie-meiney-moe technique” and is very similar to the “duck-duck-goose” algorithm, though that has an additional stochastic input.

Comments closed

Cluster Sampling in R

Published 2024-08-05 by Kevin Feasel

Steven Sanderson shows us one sampling technique:

Cluster sampling is a useful technique when dealing with large datasets spread across different groups or clusters. It involves dividing the population into clusters, randomly selecting some clusters, and then sampling all or some members from these selected clusters. This method can save time and resources compared to simple random sampling.

In this post, we’ll walk through how to perform cluster sampling in R. We’ll use a sample dataset and break down the code step-by-step. By the end, you’ll have a clear understanding of how to implement cluster sampling in your projects.

Read on for the scenario and sample code.

Comments closed

Stratified Sampling in R

Published 2024-08-02 by Kevin Feasel

Steven Sanderson builds a sample:

Stratified sampling is a technique used to ensure that different subgroups (strata) within a population are represented in a sample. This method is particularly useful when certain strata are underrepresented in a simple random sample. In this post, we’ll explore how to perform stratified sampling in R using both base R and the dplyr package. We’ll walk through examples and explain the code, so you can try these techniques on your own data.

Click through to see how.

Comments closed

Working with grep in R

Published 2024-07-26 by Kevin Feasel

Steven Sanderson performs a pattern match:

In R, finding patterns in text is a common task, and one of the most powerful functions to do this is grep(). This function is used to search for patterns in strings, allowing you to locate elements that match a specific pattern. Today, we’ll explore how to use wildcard characters with grep() to enhance your string searching capabilities. Let’s dive in!

Read on to learn more about how to use the grep() function.

Comments closed

Changing Distributions and Simpson’s Paradox

Published 2024-07-25 by Kevin Feasel

Jerry Tuttle describes a paradox:

So you spent hours, or maybe days, cranking out thousands of numbers, you submit it to your boss just at the deadline, your boss quickly peruses your exhibit of numbers, points to a single number and says, “This number doesn’t look right.” Bosses have an uncanny ability to do this.

Your boss is pointing to something like this: Your company sells property insurance on both personal and commercial properties. The average personal property premium increased 10% in 2024. The average commercial property premium increased 10% in 2024. But you say the combined average property premium decreased 3% in 2024. You realize that negative 3% does not look right.

Although the blog post doesn’t explicitly mention Simpson’s paradox, I’d argue that this is a good example of the idea. H/T R-Bloggers.

Comments closed