Press "Enter" to skip to content

Category: R

Predicting Forecast Errors of Ensemble Regression Models

Peter Laurinec builds a model to test a model:

In the last blog post about Multistep forecasting losses, I showed the usage of the fantastic method adam from the smooth R package on household electricity consumption data, and compared it with benchmarks.

Since I computed predictions from 10 methods/models for a long period of time, it would be nice to create some ensemble models for precise prediction for our household consumption data. For that purpose, it would be great to predict for example future errors of these methods. It is used in some known ensemble methods, which are not direct about stacking. Predicting errors can be beneficial for prediction weighting or for predicting the rank of methods (i.e. best one prediction). For the sake of learning something new, I will try multivariate regression models, so learning from multiple targets at once. At least, it has the benefit of simplicity, that we need only one model for all base prediction models.

Click through for Peter’s process. H/T R-Bloggers.

Comments closed

Quantile Regression in R

Steven Sanderson performs quantile regression:

Quantile regression is a robust statistical method that goes beyond traditional linear regression by allowing us to model the relationship between variables at different quantiles of the response distribution. In this blog post, we’ll explore how to perform quantile regression in R using the quantreg library.

If you need to hone up on your quantile regression knowledge, Wikipedia is usually good for statistics and here’s an academic paper from Roger Koenker and Kevin Hallock on the topic.

Comments closed

Benchmarking Memory Usage in Shiny Apps

Ryszard Szymanski tracks memory utilization:

R/Shiny allows you to prototype a working web application quickly and easily. However, with increasing amounts of data, your app may become slow and, in extreme cases, crash due to insufficient memory.

When the worst-case scenario happens, we need to figure out a way to lower the memory usage of our app to avoid those crashes.

A crucial part of optimization efforts is benchmarking how much memory our app is consuming. This allows us to check if the changes we made to the app are indeed moving us in the right direction.

In this step-by-step guide, we will describe how to do that based on an example application.

Read on for the process and how you can diagnose memory problems. H/T R-Bloggers.

Comments closed

Robust Regression in R

Steven Sanderson performs robust regression:

If you’re familiar with linear regression in R, you’ve probably encountered the traditional lm() function. While this is a powerful tool, it might not be the best choice when dealing with outliers or influential observations. In such cases, robust regression comes to the rescue, and in R, the rlm() function from the MASS package is a valuable resource. In this blog post, we’ll delve into the step-by-step process of performing robust regression in R, using a dataset to illustrate the differences between the base R lm model and the robust rlm model.

The short version of rlm() versus lm() is that Ordinary Least Squares (the form of linear regression we use with lm()) is quite susceptible to outliers. Meanwhile, rlm() uses a technique known as M-estimation, which ends up weighting outlier points different from inliers, making it less susceptible to a small number of outliers wrecking the chart.

Comments closed

The theme() Function in ggplot2

Jack Kennedy shows off a function:

The theme() function in {ggplot2} is awesome. Although it’s only one function, it gives you so much control over your final plot. theme() allows us to generate a consistent, in-house style for our graphics, modify the text within our plots and more. Getting comfortable with theme() will really take your {ggplot2} skills up a notch.

Theming visuals can have an outsized impact on how easy the output is to understand, so understanding how theme() works is important. Also, if your company has specific theming or marketing standards, you can usually build them with the theme() function and then save that theme for reuse later.

Comments closed

Plotting The Effects of Noise on R^2

Tomaz Kastrun messes with R^2:

So, an R-squared of 0.59 might show how well the data fit to the model (hence goodness of fit) and also explains about 59% of the variation in our dependent variable.

Given this logic, we prefer our regression models to have a high R-squared. Yes? Right! And by useless test, with adding random noise to a function, what happens next?

I like Tomaz’s scenario here and think he does a good job demonstrating the outcome. I do, however, struggle with the characterization of “making R^2 useless.” When the error term approaches an enormous value relative to the regressable components, that R^2 is telling you that something else is dominating the relationship between the independent variables and dependent variable. And this is correct: that error term does dominate. I suppose the problem here is philosophical: we call it an error term but what it signifies is “information we don’t understand about the relationship between these variables.” Yes, in this toy example, it was randomly-generated noise. But in a real dataset, it’s not random; it’s inexplicable, at least given the information you know at that time and the mechanisms you use to analyze the relationship.

Comments closed

Power Regression in R

Steven Sanderson’s power level is over 9000:

In the realm of statistics, power regression stands out as a versatile tool for exploring the relationship between two variables, where one variable is the power of the other. This type of regression is particularly useful when there’s an inherent nonlinear relationship between the variables, often characterized by an exponential or inverse relationship.

Read on to learn more about the definition of power regression and how to perform it in R using a technique called “swole linear regression.” Or at least that’s what I think the technique should be called. Which is probably why I’m not in charge of naming things.

Comments closed

Operating on Time Series Data in R

Dario Radečić understands that time is a flat circle:

If there’s one type of data no company has a shortage of, it has to be time series data. Yet, many beginner and intermediate R developers struggle to grasp their heads around basic R time series concepts, such as manipulating datetime values, visualizing time data over time, and handling missing date values.

Lucky for you, that will all be a thing of the past in a couple of minutes. This article brings you the basic introduction to the world of R time series analysis. We’ll cover many concepts, from key characteristics of time series datasets, loading such data in R, visualizing it, and even doing some basic operations such as smoothing the curve and visualizing a trendline.

We have a lot of work to do, so let’s jump straight in!

Click through for a high-level overview. H/T R-Bloggers.

Comments closed

Local Regression (LOESS) in R

Steven Sanderson takes us through a powerful regression technique:

LOESS, which stands for LOcal regrESSion, is a versatile and powerful technique for fitting a curve to a set of data points. Unlike traditional linear regression, LOESS adapts to the local behavior of the data, making it perfect for capturing intricate patterns in noisy datasets.

Click through for examples. LOESS works best with quadratic data, like in Steven’s last example image. The downside to it as a technique is that you can find spurious movement that may seem interesting but is just following the noise.

Comments closed