Press "Enter" to skip to content

Category: R

Converting Data to Time Series in R with TidyDensity

Steven Sanderson shows off a new function:

If you’re an R enthusiast like me, you know that data manipulation is at the core of everything we do. The ability to transform your data swiftly and efficiently can make or break your data analysis projects. That’s why I’m thrilled to introduce a game-changing function in TidyDensity, my very own R library. Say hello to convert_to_ts()!

In the world of data analysis, time series data is like a treasure chest of insights waiting to be unlocked. Whether you’re tracking stock prices, monitoring patient data, or analyzing the temperature over the years, having your data in a time series format is a crucial step in the process. With convert_to_ts(), that process just got a whole lot easier.

Click through to see how it works and what you can do with it.

Comments closed

Print Debugging in R and Rust

Jonathan Carroll leaves breadcrumbs:

Print debugging has its place. Sure, it’s not always the best way to debug something, but it can often be the fastest. In this post I describe a useful way to do this in Rust and how we can get similar behaviour in R.

Read on to see how the dbg! macro works in Rust and how to emulate it in R, both by hand and using a library called icecream. H/T R-Bloggers.

Comments closed

Reasons to Prefer apply() over For Loops in R

Hugo Gruson draws some comparisons:

The debate regarding the use of for loops versus the apply() function family (apply()lapply()vapply(), etc., along with their purrr counterparts: map()map2()map_lgl()map_chr(), etc.), has been a longstanding one in the R community.

While you may occasionally hear that for loops are slower, this notion has already been debunked in other posts. When utilized correctly, a for loop can achieve performance on par with apply() functions.

However, there are still lesser-known reasons to prefer apply() functions over for loops, which we will explore in this post.

Read on for an important caveat, and then several reasons to prefer apply() (or purrr’s counterparts). H/T R-Bloggers.

Comments closed

From Probabilities to Odds

Bryan Shalloway explains how odds and probabilities intertwine:

However human understanding of odds predates our formal understanding of probability. You can find references to odds dating back to Shakespeare:

Knew that we ventured on such dangerous seas
That if we wrought out life ’twas ten to one;
– Shakespeare’s Henry IV, Part II, 1597

Yet, in most common settings, modern society has largely supplanted odds for probabilities. You can imagine if Shakespeare were writing today the line might end “’twas ten out of eleven.

Read the whole thing.

Comments closed

Fitting Distributions to Datasets in R

Steven Sanderson tests a distribution fit:

There are two main ways to fit a gamma distribution to a dataset in R:

  1. Maximum likelihood estimation (MLE): This method estimates the parameters of the gamma distribution that are most likely to have produced the observed data.
  2. Method of moments: This method estimates the parameters of the gamma distribution by equating the sample mean and variance to the theoretical mean and variance of the gamma distribution.

Click through to see which technique Steven uses and an example of how it all works.

Comments closed

Triangular Distributions and R

Steven Sanderson unleashes the power of the triangle:

The Triangular Distribution is a continuous probability distribution with a triangular shape, hence the name. It is defined by three parameters: minmax, and mode. These parameters determine the range of values the distribution can take and the most likely value within that range. In mathematical terms, the probability density function (PDF) of the Triangular Distribution is given by:

Read on to see the definition, as well as how you can use the four functions around the Triangular Distribution.

By the way, the best-known case of the Triangular Distribution is combining the results of two fair dice, which gives us a peak at the number 7 (1/6 of the time) for a pair of fair, six-sided dice and moving symmetrically down from there, so p(6) = p(8), p(5) = p(9), and so on.

Comments closed

Building a Multinomial Distribution in R

Steven Sanderson isn’t satisfied with the binomial:

The multinomial distribution is a probability distribution that describes the probability of obtaining a specific number of counts for k different outcomes, when each outcome has a fixed probability of occurring.

In R, we can use the rmultinom() function to simulate random samples from a multinomial distribution, and the dmultinom() function to calculate the probability of a specific outcome.

Click through to see how you can build a multinomial distribution and what the difference is between rmultinom() and dmultinom().

Comments closed

Uniform Random Number Generation in R

Steven Sanderson digs into the uniform distribution:

Randomness is an essential part of many statistical and machine learning tasks. In R, there are a number of functions that can be used to generate random numbers, but the runif() function is the most commonly used.

Something mildly embarrassing for me is that it took me a while to figure out why they call the command runif(). That’s because, at first, I didn’t pronounce it r unif but rather run if.

In reality, *unif() means “uniform distribution” and r stands for “random number.” There are several other functions based on the uniform distribution and Steven looks at those as well in this post.

Comments closed

An Analysis of Goal Line Runs out of Shotgun

I decided to test a common narrative:

A common theme among Buffalo Bills fans is the idea that the Bills run too many plays out of shotgun near the opposing team’s goal line, and this is hampering their ability to score points. Instead, these fans argue, they should run from under center, either a direct handoff or a quarterback sneak. If you were to press fans on this, I believe you’d also hear that the Bills are unique, or at least uniquely bad, at running such plays.

I’m going to use the nflfastR package to analyze play-by-play data and see just how well this bit of fan wisdom holds up.

Spoiler alert: it doesn’t.

Comments closed