R – Page 22 – Curated SQL

Explaining Odds Ratios

Published 2023-12-19 by Kevin Feasel

Steven Sanderson explains the concept of an odds ratio:

Imagine a loan officer flipping a coin to decide whether to approve your loan. Odds ratios tell you how much more likely one factor (like your income) makes the “heads” (approval) side appear compared to another (like your student status).

In logistic regression, odds ratios compare the odds of an event (loan default, in our case) for two groups defined by a specific variable. They’re like multipliers: greater than 1 means something increases the chances of default, while less than 1 means it decreases them.

As for why, we use odds ratios because it’s hard to track and interpret changes in probabilities directly, at least when you’re thinking small numbers.

Comments closed

Fighting Heteroskedasticity in Regression Problems

Published 2023-12-18 by Kevin Feasel

Steven Sanderson deals with my favorite failure of BLUE (mainly because I love the name):

Tired of your least-squares regression model giving wonky results because some data points shout louder than others? Meet Weighted Least Squares (WLS), the superhero of regression, ready to tackle unequal variance (heteroscedasticity) and give your model the justice it deserves! Today, we’ll dive into the world of WLS in R, using base functions for maximum transparency. Buckle up, data warriors!

Read on to see how Weighted Least Squares helps in data analysis when you have heteroskedasticity.

Comments closed

Analyzing Shiny App Startup Times

Published 2023-12-13 by Kevin Feasel

Osheen MacOscar wants to know how long it takes to start up that Shiny app:

In the last blog I spoke about using Google Lighthouse to test the speed of web pages. I wanted to build upon that and use Lighthouse to test some Shiny apps.

To get a feel for Shiny’s performance in a Lighthouse analysis, I needed a lot of shiny apps that I could test and create a dataset from, so I used the entries to the 2021 Shiny app contest, which is a competition where people enter Shiny apps to be judged on technical merit and artistic achievement. I used the 2021 apps as there has unfortunately not been a competition since. A full list of the submissions can be found on the Posit Community website.

Read on to see what you can do with Lighthouse, as well as a few pain points around it.

Comments closed

Testing the Reproducibility of Random Numbers in STAN

Published 2023-12-13 by Kevin Feasel

Sebastian Sauer performs some tests:

Bayes models (using MCMC) build on drawing random numbers. By their very nature, random numbers are random. Unless they are not. As you may know, the random number fuctions in computers are purely deterministic.

However, in practice, some inpredictable behavior may still show up. The reason being simply that two computational environment must – in theory – being exactly identical in order to reproduce the same results. At least identical in every bit that influence random number generation.

Click through for the testbed and code.

Comments closed

Stepwise and Piecewise Regression in R

Published 2023-12-12 by Kevin Feasel

Steven Sanderson takes us through two regression techniques. First up is stepwise regression:

Stepwise regression is a powerful technique used to build predictive models by iteratively adding or removing variables based on statistical criteria. In R, this can be achieved using functions like step() or manually with forward and backward selection.

Piecewise regression follows:

Piecewise regression is a powerful technique that allows us to model distinct segments of a dataset with different linear relationships. It’s like fitting multiple straight lines to capture the nuances of different regions in your data. So, grab your virtual lab coat, and let’s get started.

Read on for explanations of both techniques, as well as some visuals and potential pitfalls you might run into along the way.

Comments closed

Spline Regression

Published 2023-12-08 by Kevin Feasel

Steven Sanderson performs a different form of regression:

Spline regression is particularly useful when the relationship between the independent and dependent variables is not adequately captured by a linear model. It involves fitting a piecewise continuous curve (spline) to the data. Let’s dive into the process using R.

Click through and you’ll be reticulating splines just like it’s Sim City 2000.

Comments closed

Predicting Forecast Errors of Ensemble Regression Models

Published 2023-12-04 by Kevin Feasel

Peter Laurinec builds a model to test a model:

In the last blog post about Multistep forecasting losses, I showed the usage of the fantastic method adam from the smooth R package on household electricity consumption data, and compared it with benchmarks.

Since I computed predictions from 10 methods/models for a long period of time, it would be nice to create some ensemble models for precise prediction for our household consumption data. For that purpose, it would be great to predict for example future errors of these methods. It is used in some known ensemble methods, which are not direct about stacking. Predicting errors can be beneficial for prediction weighting or for predicting the rank of methods (i.e. best one prediction). For the sake of learning something new, I will try multivariate regression models, so learning from multiple targets at once. At least, it has the benefit of simplicity, that we need only one model for all base prediction models.

Click through for Peter’s process. H/T R-Bloggers.

Comments closed

Quantile Regression in R

Published 2023-12-01 by Kevin Feasel

Steven Sanderson performs quantile regression:

Quantile regression is a robust statistical method that goes beyond traditional linear regression by allowing us to model the relationship between variables at different quantiles of the response distribution. In this blog post, we’ll explore how to perform quantile regression in R using the quantreg library.

If you need to hone up on your quantile regression knowledge, Wikipedia is usually good for statistics and here’s an academic paper from Roger Koenker and Kevin Hallock on the topic.

Comments closed

Benchmarking Memory Usage in Shiny Apps

Published 2023-12-01 by Kevin Feasel

Ryszard Szymanski tracks memory utilization:

R/Shiny allows you to prototype a working web application quickly and easily. However, with increasing amounts of data, your app may become slow and, in extreme cases, crash due to insufficient memory.

When the worst-case scenario happens, we need to figure out a way to lower the memory usage of our app to avoid those crashes.

A crucial part of optimization efforts is benchmarking how much memory our app is consuming. This allows us to check if the changes we made to the app are indeed moving us in the right direction.

In this step-by-step guide, we will describe how to do that based on an example application.

Read on for the process and how you can diagnose memory problems. H/T R-Bloggers.

Comments closed

Robust Regression in R

Published 2023-11-30 by Kevin Feasel

Steven Sanderson performs robust regression:

If you’re familiar with linear regression in R, you’ve probably encountered the traditional lm() function. While this is a powerful tool, it might not be the best choice when dealing with outliers or influential observations. In such cases, robust regression comes to the rescue, and in R, the rlm() function from the MASS package is a valuable resource. In this blog post, we’ll delve into the step-by-step process of performing robust regression in R, using a dataset to illustrate the differences between the base R lm model and the robust rlm model.

The short version of rlm() versus lm() is that Ordinary Least Squares (the form of linear regression we use with lm()) is quite susceptible to outliers. Meanwhile, rlm() uses a technique known as M-estimation, which ends up weighting outlier points different from inliers, making it less susceptible to a small number of outliers wrecking the chart.

Comments closed

Category: R