R – Page 15 – Curated SQL

Plotting Individual Values and Means of Multiple Groups in R

Published 2024-07-25 by Kevin Feasel

In this post I show how groupScatterPlot(), function of the rnatoolbox R package can be used for plotting the individual values in several groups together with their mean (or other statistics). I think this is a useful function for plotting grouped data when some groups (or all groups) have few data points ! You may be wondering why to include such function in the rnatoolbox package ?! Well ! I happen to use it quit a bit for plotting expression values of different groups of genes/transcripts in a sample or expression levels of a specific gene/transcript in several sample groups.

Click through for the sample code and output. H/T R-Bloggers.

Comments closed

Finding Multiple Substrings in R

Published 2024-07-24 by Kevin Feasel

Steven Sanderson is looking for two things:

Hello, fellow R programmers! Today, we’re looking at a practical topic that often comes up when dealing with text data: how to check if a string contains multiple substrings. We’ll cover how to do this in base R, as well as using the stringr and stringi packages. Each approach has its own advantages, so let’s explore them together.

Read on for three separate examples.

Comments closed

Radar Love

Published 2024-07-22 by Kevin Feasel

Jerry Tuttle talks radar charts:

I was looking for an opportunity to practice with radar charts and I came across an article on five-tool baseball players, so this seemed like a perfect application for this kind of chart.

A radar chart is an alternative to a column chart to display three or more quantitative variables. The chart graphs the values in a circular manner around a center point.

I have an unhealthy love for radar charts in the right circumstances, and this love came from the way you did scouting in earlier versions of Madden NFL games, using the radar chart to estimate traits. The only problem was, the charts turned out to be a lie: they didn’t really correlate to player talents, but that was something I learned years and years later and probably explains why I’m so bitter all the time. H/T R-Bloggers.

Comments closed

Using the fast_regression() Method in tidyAML

Published 2024-07-19 by Kevin Feasel

Steven Sanderson says, It’s my regression and I want it NOW:

If you’ve ever faced the daunting task of setting up multiple regression models in R, you’ll appreciate the convenience and efficiency that tidyAML brings to the table. Today, we’re diving into one of its standout functions: fast_regression(). This function is designed to streamline the regression modeling process, allowing you to quickly create and evaluate a variety of model specifications with minimal code.

Read on to see how the function works.

Comments closed

Extracting the End of a String in R

Published 2024-07-17 by Kevin Feasel

Steven Sanderson just wants the conclusion:

Hey useR’s! Today, we’re going to discuss a neat trick: extracting substrings starting from the end of a string. We’ll cover how to achieve this using base R, stringr, and stringi. By the end of this post, you’ll have several tools in your R toolbox for string manipulation. Let’s get started!

Read on to see how you can do it in three separate libraries.

Comments closed

Generating a Schedule in R

Published 2024-07-15 by Kevin Feasel

Tomaz Kastrun builds timetables:

Each meeting slot is represented as block (lasts arbitrary number of hours, mostly form 1 to 4). For conducting every block required are: pair of departmetns, room, time-slot. It is also know in advance which groups attend which class and all rooms are the same size.

Input data all departments names, room names and time-slots.
Output data are rooms and timeslots for pair of departments in a time-schedule.

Click through for the code and explanation.

Comments closed

Transferring Linear Model Coefficients

Published 2024-07-15 by Kevin Feasel

Nina Zumel performs a swap:

A quick glance through the scikit-learn documentation on linear models, or the CRAN task view on Mixed, Multilevel, and Hierarchical Models in R reveals a number of different procedures for fitting models with linear structure. Each of these procedures meet different needs and constraints, and some of them can be computationally intensive to compute. But in the end, they all have the same underlying structure: outcome is modelled as a linear combination of input features.

But the existence of so many different algorithms, and their associated software, can obscure the fact that just because two models were fit differently, they don’t have to be run differently. The fitting implementation and the deployment implementation can be distinct. In this note, we’ll talk about transferring the coefficients of a linear model to a fresh model, without a full retraining.

I had a similar problem about 18 months ago, though much easier than the one Nina describes, as I did have access to the original data and simply needed to build a linear regression in Python that matched exactly the one they developed in R. Turns out that’s not as easy to do as you might think: the different languages have different default assumptions that make the results similar but not the same, and piecing all of this together took a bit of sleuthing.

Comments closed

A/B Testing with Survival Analysis in R

Published 2024-07-12 by Kevin Feasel

Iyar Lin combines two great flavors:

Usually when running an A/B test analysts assign users randomly to variants over time and measure conversion rate as the ratio between the number of conversions and the number of users in each variant. Users who just entered the test and those who are in the test for 2 weeks get the same weight.

This can be enough for cases where a conversion either happens or not within a short time frame after assignment to a variant (e.g. Finishing an on-boarding flow).

There are however many instances where conversions are spread over a longer time frame. One example would be first order after visiting a site landing page. Such conversions may happen within minutes, but a large churn could also happen within days after the first visit.

Read on for the scenario, as well as a simulation. I will note that, in the digital marketing industry, there’s usually a hard cap on number of days where you’re able to attribute a conversion to some action for exactly the reason Iyar mentions. H/T R-Bloggers.

Comments closed

Random Walks in R with TidyDensity

Published 2024-07-11 by Kevin Feasel

Steven Sanderson goes for a walk:

A random walk is a mathematical object that describes a path consisting of a succession of random steps. It’s a cornerstone concept in fields like physics, economics, and biology. In finance, for example, the random walk hypothesis suggests that stock market prices evolve according to a random walk and thus cannot be predicted.

Read on to see how you can generate a dataset matching a random walk, as well as a comparison of techniques for generating them.

Comments closed

Extracting Strings before a Space using R

Published 2024-07-10 by Kevin Feasel

Steven Sanderson grabs a name:

Hello, R users! Today, we’ll dive into a common text manipulation task: extracting strings before a space. This is a handy trick for dealing with names, addresses, or any text data where you need to isolate the first part of a string.

We’ll explore three approaches: using base R, stringr, and stringi. Each method offers its unique advantages, so you can choose the one that fits your style best.

Click through for the three examples. I will note that if you’re actually using this code to split names, well, names tend to be a lot trickier than we give them credit for. Keep in mind that people can have multi-part names (“Debbie Mae” or “van den Berg”), so unless you know the data all follows a specific pattern, don’t assume the data follows a specific pattern.

Comments closed

Category: R