Press "Enter" to skip to content

Category: R

Exponential Regression in R

Steven Sanderson understands the power of compound interest:

Before we jump into the code, let’s quickly grasp the concept of exponential regression. In simple terms, it’s a statistical method used to model relationships where the rate of change of a variable is proportional to its current state. Think of scenarios like population growth, viral spread, or even financial investments.

Read on to see how you can perform a regression in this kind of scenario.

Comments closed

Practical Applications of Functional Programming Techniques in R

Anirban Shaw continues a series on functional programming:

Functional programming in R is more than just a trendy buzzword; it’s a powerful approach that can dramatically simplify and enhance your data analysis tasks. In this section, we’ll explore real-world examples of common data analysis problems solved using functional programming in R, comparing them to traditional imperative methods. We’ll also highlight the conciseness and readability of functional code, demonstrating why it’s a game-changer for data professionals.

I think there’s a lot of value in understanding the functional programming approach for a few reasons:

  1. Functional programming makes more intuitive sense to data platform developers who are raised on “set-based” operations and thinking over columns rather than rows.
  2. Functional programming allows for ease of parallel computation. This is why Spark heavily emphasizes functional programming and why the code base itself is written in a functional programming language.
  3. Functional code is often terser than equivalent imperative code.
  4. Functional code allows you to build more “bottom-up,” starting with small functions and composing them rather than thinking in a “top-down” approach of classes and interactions between classes. For people who are not full-time application developers, the ease of writing a quick function is huge, which is part of why shell scripting languages like Powershell and Bash emphasize this approach as well.

H/T R-Bloggers.

Comments closed

Quadratic Regression in R

Steven Sanderson needs more than a line:

In the realm of data analysis, quadratic regression emerges as a powerful tool for uncovering the hidden patterns within datasets that exhibit non-linear relationships. Unlike its linear counterpart, quadratic regression ventures beyond straight lines, gracefully capturing curved relationships between variables. This makes it an essential technique for understanding a wide range of phenomena, from predicting stock prices to modeling population growth.

Embark on a journey into the world of quadratic regression using the versatile R programming language. We’ll explore the steps involved in fitting a quadratic model, interpreting its parameters, and visualizing the results. Along the way, you’ll gain hands-on experience with this valuable technique, enabling you to tackle your own data analysis challenges with confidence.

Read on to see how you can model a quadratic relationship between one independent variable (or multiple independent variables) and the dependent variable in lm().

Comments closed

New Features in healthyR.ts 0.3

Steven Sanderson lays out some updates:

One of the standout additions is the introduction of util_log_ts(). This function seems like a game-changer, providing a streamlined way to log time series data. This is incredibly useful, especially when dealing with extensive datasets, making the whole process more efficient and user-friendly. This is a helper function for auto_stationarize().

There’s a lot in this update and the blog post also includes several examples of automating stationarity and ARIMA.

Comments closed

Creating Prediction Intervals in R

Steven Sanderson builds a prediction interval:

Prediction intervals are a powerful tool for understanding the uncertainty of your predictions. They allow you to specify a range of values within which you are confident that the true value will fall. This can be useful for many tasks, such as setting realistic goals, making informed decisions, and communicating your findings to others.

In this blog post, we will show you how to create a prediction interval in R using the mtcars dataset. The mtcars dataset is a built-in dataset in R that contains information about fuel economy, weight, displacement, and other characteristics of 32 cars.

Click through to see an example based on linear regression.

Comments closed

Comparing Permutation SHAP and Kernel SHAP

Michael Mayer lays some groundwork:

SHAP is the predominant way to interpret black-box ML models, especially for tree-based models with the blazingly fast TreeSHAP algorithm.

For general models, two slower SHAP algorithms exist:

  1. Permutation SHAP (Štrumbelj and Kononenko, 2010)
  2. Kernel SHAP (Lundberg and Lee, 2017)

Read on to understand more about these two forms of SHAP, as well as how they compare in two datasets of differing levels of difficulty.

Comments closed

Simulating a Bivariate Normal Distribution in R

Steven Sanderson isn’t content with a univariate normal distribution:

Imagine two variables, like height and weight, that exhibit a joint distribution. The bivariate normal distribution captures the relationship between these variables, describing how their values tend to cluster around certain means and how they vary together. It’s like a two-dimensional bell curve, where the peak represents the most likely combination of values for both variables.

Click through to learn a bit more about bivariate normal distributions, including ways to plot one and show its density function.

Comments closed

Functions to Deal with Object Names in R

Maelle Salmon has a sticky note:

Interestingly the docs for setNames() sound as if it were created just for this use case!

“This is a convenience function that sets the names on an object and returns the object. It is most useful at the end of a function definition where one is creating the object to be returned and would prefer not to store it under a name just so the names can be assigned.”

For the opposite operation, removing the names of an object, we can use unname().

Read on for an overview of several of these functions. H/T R-Bloggers.

Comments closed

Plotting a Cumulative Distribution Function in R

Steven Sanderson builds a plot:

Before delving into the world of R programming, let’s first grasp the fundamental concept of a CDF. Imagine a group of students eagerly awaiting their exam results. The CDF for their scores would depict the probability of encountering a student with a score less than or equal to a specific value. For instance, if the CDF indicates a value of 0.7 at 80%, it implies that there’s a 70% chance of finding a student with a score of 80 or lower.

Read on to see how you can calculate this in a dataset and then plot the CDF.

Comments closed