Press "Enter" to skip to content

Category: R

Radar Love

Jerry Tuttle talks radar charts:

I was looking for an opportunity to practice with radar charts and I came across an article on five-tool baseball players, so this seemed like a perfect application for this kind of chart.

      A radar chart is an alternative to a column chart to display three or more quantitative variables. The chart graphs the values in a circular manner around a center point.

I have an unhealthy love for radar charts in the right circumstances, and this love came from the way you did scouting in earlier versions of Madden NFL games, using the radar chart to estimate traits. The only problem was, the charts turned out to be a lie: they didn’t really correlate to player talents, but that was something I learned years and years later and probably explains why I’m so bitter all the time. H/T R-Bloggers.

Comments closed

Using the fast_regression() Method in tidyAML

Steven Sanderson says, It’s my regression and I want it NOW:

If you’ve ever faced the daunting task of setting up multiple regression models in R, you’ll appreciate the convenience and efficiency that tidyAML brings to the table. Today, we’re diving into one of its standout functions: fast_regression(). This function is designed to streamline the regression modeling process, allowing you to quickly create and evaluate a variety of model specifications with minimal code.

Read on to see how the function works.

Comments closed

Extracting the End of a String in R

Steven Sanderson just wants the conclusion:

Hey useR’s! Today, we’re going to discuss a neat trick: extracting substrings starting from the end of a string. We’ll cover how to achieve this using base R, stringr, and stringi. By the end of this post, you’ll have several tools in your R toolbox for string manipulation. Let’s get started!

Read on to see how you can do it in three separate libraries.

Comments closed

Generating a Schedule in R

Tomaz Kastrun builds timetables:

Each meeting slot is represented as block (lasts arbitrary number of hours, mostly form 1 to 4). For conducting every block required are: pair of departmetnsroomtime-slot. It is also know in advance which groups attend which class and all rooms are the same size.

Input data all departments names, room names and time-slots.
Output data are rooms and timeslots for pair of departments in a time-schedule.

Click through for the code and explanation.

Comments closed

Transferring Linear Model Coefficients

Nina Zumel performs a swap:

A quick glance through the scikit-learn documentation on linear models, or the CRAN task view on Mixed, Multilevel, and Hierarchical Models in R reveals a number of different procedures for fitting models with linear structure. Each of these procedures meet different needs and constraints, and some of them can be computationally intensive to compute. But in the end, they all have the same underlying structure: outcome is modelled as a linear combination of input features.

But the existence of so many different algorithms, and their associated software, can obscure the fact that just because two models were fit differently, they don’t have to be run differently. The fitting implementation and the deployment implementation can be distinct. In this note, we’ll talk about transferring the coefficients of a linear model to a fresh model, without a full retraining.

I had a similar problem about 18 months ago, though much easier than the one Nina describes, as I did have access to the original data and simply needed to build a linear regression in Python that matched exactly the one they developed in R. Turns out that’s not as easy to do as you might think: the different languages have different default assumptions that make the results similar but not the same, and piecing all of this together took a bit of sleuthing.

Comments closed

A/B Testing with Survival Analysis in R

Iyar Lin combines two great flavors:

Usually when running an A/B test analysts assign users randomly to variants over time and measure conversion rate as the ratio between the number of conversions and the number of users in each variant. Users who just entered the test and those who are in the test for 2 weeks get the same weight.

This can be enough for cases where a conversion either happens or not within a short time frame after assignment to a variant (e.g. Finishing an on-boarding flow).

There are however many instances where conversions are spread over a longer time frame. One example would be first order after visiting a site landing page. Such conversions may happen within minutes, but a large churn could also happen within days after the first visit.

Read on for the scenario, as well as a simulation. I will note that, in the digital marketing industry, there’s usually a hard cap on number of days where you’re able to attribute a conversion to some action for exactly the reason Iyar mentions. H/T R-Bloggers.

Comments closed

Random Walks in R with TidyDensity

Steven Sanderson goes for a walk:

A random walk is a mathematical object that describes a path consisting of a succession of random steps. It’s a cornerstone concept in fields like physics, economics, and biology. In finance, for example, the random walk hypothesis suggests that stock market prices evolve according to a random walk and thus cannot be predicted.

Read on to see how you can generate a dataset matching a random walk, as well as a comparison of techniques for generating them.

Comments closed

Extracting Strings before a Space using R

Steven Sanderson grabs a name:

Hello, R users! Today, we’ll dive into a common text manipulation task: extracting strings before a space. This is a handy trick for dealing with names, addresses, or any text data where you need to isolate the first part of a string.

We’ll explore three approaches: using base R, stringr, and stringi. Each method offers its unique advantages, so you can choose the one that fits your style best.

Click through for the three examples. I will note that if you’re actually using this code to split names, well, names tend to be a lot trickier than we give them credit for. Keep in mind that people can have multi-part names (“Debbie Mae” or “van den Berg”), so unless you know the data all follows a specific pattern, don’t assume the data follows a specific pattern.

Comments closed

Automating R Scripts via taskscheduleR

Steven Sanderson builds a Windows task:

Today, let’s dive into a nifty R package called taskscheduleR that can automate running your R scripts. Whether you need to execute a task every hour or just once a day, taskscheduleR has you covered. This package leverages the Windows Task Scheduler, making it a breeze to schedule and automate repetitive tasks directly from R. Let’s walk through a couple of examples from my new book, “Extending Excel with Python and R”.

Click through for those examples. Also check out Steven’s new book, that came out at the end of April.

Comments closed

Creating a Dragon Curve in R

Tomaz Kastrun adds dragons to the edge of the map:

The algorithm is a fractal curve of Hausdorff dimension 2. One starts with one segment. In each iteration the number of segments is doubled by taking each segment as the diagonal of a square and replacing it by half the square (90 degrees). Alternating and doing the left and right function / direction to complement in order to get the shape.

Clickt hrough for the sample code and how the plot looks.

Comments closed