Press "Enter" to skip to content

Category: R

Working with grep in R

Steven Sanderson performs a pattern match:

In R, finding patterns in text is a common task, and one of the most powerful functions to do this is grep(). This function is used to search for patterns in strings, allowing you to locate elements that match a specific pattern. Today, we’ll explore how to use wildcard characters with grep() to enhance your string searching capabilities. Let’s dive in!

Read on to learn more about how to use the grep() function.

Comments closed

Changing Distributions and Simpson’s Paradox

Jerry Tuttle describes a paradox:

So you spent hours, or maybe days, cranking out thousands of numbers, you submit it to your boss just at the deadline, your boss quickly peruses your exhibit of numbers, points to a single number and says, “This number doesn’t look right.” Bosses have an uncanny ability to do this.

      Your boss is pointing to something like this: Your company sells property insurance on both personal and commercial properties. The average personal property premium increased 10% in 2024. The average commercial property premium increased 10% in 2024. But you say the combined average property premium decreased 3% in 2024. You realize that negative 3% does not look right.

Although the blog post doesn’t explicitly mention Simpson’s paradox, I’d argue that this is a good example of the idea. H/T R-Bloggers.

Comments closed

Plotting Individual Values and Means of Multiple Groups in R

Ali Oghabian builds a graph:

In this post I show how groupScatterPlot(), function of the rnatoolbox R package can be used for plotting the individual values in several groups together with their mean (or other statistics). I think this is a useful function for plotting grouped data when some groups (or all groups) have few data points ! You may be wondering why to include such function in the rnatoolbox package ?! Well ! I happen to use it quit a bit for plotting expression values of different groups of genes/transcripts in a sample or expression levels of a specific gene/transcript in several sample groups.

Click through for the sample code and output. H/T R-Bloggers.

Comments closed

Finding Multiple Substrings in R

Steven Sanderson is looking for two things:

Hello, fellow R programmers! Today, we’re looking at a practical topic that often comes up when dealing with text data: how to check if a string contains multiple substrings. We’ll cover how to do this in base R, as well as using the stringr and stringi packages. Each approach has its own advantages, so let’s explore them together.

Read on for three separate examples.

Comments closed

Radar Love

Jerry Tuttle talks radar charts:

I was looking for an opportunity to practice with radar charts and I came across an article on five-tool baseball players, so this seemed like a perfect application for this kind of chart.

      A radar chart is an alternative to a column chart to display three or more quantitative variables. The chart graphs the values in a circular manner around a center point.

I have an unhealthy love for radar charts in the right circumstances, and this love came from the way you did scouting in earlier versions of Madden NFL games, using the radar chart to estimate traits. The only problem was, the charts turned out to be a lie: they didn’t really correlate to player talents, but that was something I learned years and years later and probably explains why I’m so bitter all the time. H/T R-Bloggers.

Comments closed

Using the fast_regression() Method in tidyAML

Steven Sanderson says, It’s my regression and I want it NOW:

If you’ve ever faced the daunting task of setting up multiple regression models in R, you’ll appreciate the convenience and efficiency that tidyAML brings to the table. Today, we’re diving into one of its standout functions: fast_regression(). This function is designed to streamline the regression modeling process, allowing you to quickly create and evaluate a variety of model specifications with minimal code.

Read on to see how the function works.

Comments closed

Extracting the End of a String in R

Steven Sanderson just wants the conclusion:

Hey useR’s! Today, we’re going to discuss a neat trick: extracting substrings starting from the end of a string. We’ll cover how to achieve this using base R, stringr, and stringi. By the end of this post, you’ll have several tools in your R toolbox for string manipulation. Let’s get started!

Read on to see how you can do it in three separate libraries.

Comments closed

Generating a Schedule in R

Tomaz Kastrun builds timetables:

Each meeting slot is represented as block (lasts arbitrary number of hours, mostly form 1 to 4). For conducting every block required are: pair of departmetnsroomtime-slot. It is also know in advance which groups attend which class and all rooms are the same size.

Input data all departments names, room names and time-slots.
Output data are rooms and timeslots for pair of departments in a time-schedule.

Click through for the code and explanation.

Comments closed

Transferring Linear Model Coefficients

Nina Zumel performs a swap:

A quick glance through the scikit-learn documentation on linear models, or the CRAN task view on Mixed, Multilevel, and Hierarchical Models in R reveals a number of different procedures for fitting models with linear structure. Each of these procedures meet different needs and constraints, and some of them can be computationally intensive to compute. But in the end, they all have the same underlying structure: outcome is modelled as a linear combination of input features.

But the existence of so many different algorithms, and their associated software, can obscure the fact that just because two models were fit differently, they don’t have to be run differently. The fitting implementation and the deployment implementation can be distinct. In this note, we’ll talk about transferring the coefficients of a linear model to a fresh model, without a full retraining.

I had a similar problem about 18 months ago, though much easier than the one Nina describes, as I did have access to the original data and simply needed to build a linear regression in Python that matched exactly the one they developed in R. Turns out that’s not as easy to do as you might think: the different languages have different default assumptions that make the results similar but not the same, and piecing all of this together took a bit of sleuthing.

Comments closed

A/B Testing with Survival Analysis in R

Iyar Lin combines two great flavors:

Usually when running an A/B test analysts assign users randomly to variants over time and measure conversion rate as the ratio between the number of conversions and the number of users in each variant. Users who just entered the test and those who are in the test for 2 weeks get the same weight.

This can be enough for cases where a conversion either happens or not within a short time frame after assignment to a variant (e.g. Finishing an on-boarding flow).

There are however many instances where conversions are spread over a longer time frame. One example would be first order after visiting a site landing page. Such conversions may happen within minutes, but a large churn could also happen within days after the first visit.

Read on for the scenario, as well as a simulation. I will note that, in the digital marketing industry, there’s usually a hard cap on number of days where you’re able to attribute a conversion to some action for exactly the reason Iyar mentions. H/T R-Bloggers.

Comments closed