Press "Enter" to skip to content

Category: R

Local Regression (LOESS) in R

Steven Sanderson takes us through a powerful regression technique:

LOESS, which stands for LOcal regrESSion, is a versatile and powerful technique for fitting a curve to a set of data points. Unlike traditional linear regression, LOESS adapts to the local behavior of the data, making it perfect for capturing intricate patterns in noisy datasets.

Click through for examples. LOESS works best with quadratic data, like in Steven’s last example image. The downside to it as a technique is that you can find spurious movement that may seem interesting but is just following the noise.

Comments closed

Exponential Regression in R

Steven Sanderson understands the power of compound interest:

Before we jump into the code, let’s quickly grasp the concept of exponential regression. In simple terms, it’s a statistical method used to model relationships where the rate of change of a variable is proportional to its current state. Think of scenarios like population growth, viral spread, or even financial investments.

Read on to see how you can perform a regression in this kind of scenario.

Comments closed

Practical Applications of Functional Programming Techniques in R

Anirban Shaw continues a series on functional programming:

Functional programming in R is more than just a trendy buzzword; it’s a powerful approach that can dramatically simplify and enhance your data analysis tasks. In this section, we’ll explore real-world examples of common data analysis problems solved using functional programming in R, comparing them to traditional imperative methods. We’ll also highlight the conciseness and readability of functional code, demonstrating why it’s a game-changer for data professionals.

I think there’s a lot of value in understanding the functional programming approach for a few reasons:

  1. Functional programming makes more intuitive sense to data platform developers who are raised on “set-based” operations and thinking over columns rather than rows.
  2. Functional programming allows for ease of parallel computation. This is why Spark heavily emphasizes functional programming and why the code base itself is written in a functional programming language.
  3. Functional code is often terser than equivalent imperative code.
  4. Functional code allows you to build more “bottom-up,” starting with small functions and composing them rather than thinking in a “top-down” approach of classes and interactions between classes. For people who are not full-time application developers, the ease of writing a quick function is huge, which is part of why shell scripting languages like Powershell and Bash emphasize this approach as well.

H/T R-Bloggers.

Comments closed

Quadratic Regression in R

Steven Sanderson needs more than a line:

In the realm of data analysis, quadratic regression emerges as a powerful tool for uncovering the hidden patterns within datasets that exhibit non-linear relationships. Unlike its linear counterpart, quadratic regression ventures beyond straight lines, gracefully capturing curved relationships between variables. This makes it an essential technique for understanding a wide range of phenomena, from predicting stock prices to modeling population growth.

Embark on a journey into the world of quadratic regression using the versatile R programming language. We’ll explore the steps involved in fitting a quadratic model, interpreting its parameters, and visualizing the results. Along the way, you’ll gain hands-on experience with this valuable technique, enabling you to tackle your own data analysis challenges with confidence.

Read on to see how you can model a quadratic relationship between one independent variable (or multiple independent variables) and the dependent variable in lm().

Comments closed

New Features in healthyR.ts 0.3

Steven Sanderson lays out some updates:

One of the standout additions is the introduction of util_log_ts(). This function seems like a game-changer, providing a streamlined way to log time series data. This is incredibly useful, especially when dealing with extensive datasets, making the whole process more efficient and user-friendly. This is a helper function for auto_stationarize().

There’s a lot in this update and the blog post also includes several examples of automating stationarity and ARIMA.

Comments closed

Creating Prediction Intervals in R

Steven Sanderson builds a prediction interval:

Prediction intervals are a powerful tool for understanding the uncertainty of your predictions. They allow you to specify a range of values within which you are confident that the true value will fall. This can be useful for many tasks, such as setting realistic goals, making informed decisions, and communicating your findings to others.

In this blog post, we will show you how to create a prediction interval in R using the mtcars dataset. The mtcars dataset is a built-in dataset in R that contains information about fuel economy, weight, displacement, and other characteristics of 32 cars.

Click through to see an example based on linear regression.

Comments closed

Comparing Permutation SHAP and Kernel SHAP

Michael Mayer lays some groundwork:

SHAP is the predominant way to interpret black-box ML models, especially for tree-based models with the blazingly fast TreeSHAP algorithm.

For general models, two slower SHAP algorithms exist:

  1. Permutation SHAP (Štrumbelj and Kononenko, 2010)
  2. Kernel SHAP (Lundberg and Lee, 2017)

Read on to understand more about these two forms of SHAP, as well as how they compare in two datasets of differing levels of difficulty.

Comments closed

Simulating a Bivariate Normal Distribution in R

Steven Sanderson isn’t content with a univariate normal distribution:

Imagine two variables, like height and weight, that exhibit a joint distribution. The bivariate normal distribution captures the relationship between these variables, describing how their values tend to cluster around certain means and how they vary together. It’s like a two-dimensional bell curve, where the peak represents the most likely combination of values for both variables.

Click through to learn a bit more about bivariate normal distributions, including ways to plot one and show its density function.

Comments closed