Data Science – Page 15

Exponential Regression in R

Published 2023-11-22 by Kevin Feasel

Steven Sanderson understands the power of compound interest:

Before we jump into the code, let’s quickly grasp the concept of exponential regression. In simple terms, it’s a statistical method used to model relationships where the rate of change of a variable is proportional to its current state. Think of scenarios like population growth, viral spread, or even financial investments.

Read on to see how you can perform a regression in this kind of scenario.

Comments closed

Quadratic Regression in R

Published 2023-11-21 by Kevin Feasel

Steven Sanderson needs more than a line:

In the realm of data analysis, quadratic regression emerges as a powerful tool for uncovering the hidden patterns within datasets that exhibit non-linear relationships. Unlike its linear counterpart, quadratic regression ventures beyond straight lines, gracefully capturing curved relationships between variables. This makes it an essential technique for understanding a wide range of phenomena, from predicting stock prices to modeling population growth.

Embark on a journey into the world of quadratic regression using the versatile R programming language. We’ll explore the steps involved in fitting a quadratic model, interpreting its parameters, and visualizing the results. Along the way, you’ll gain hands-on experience with this valuable technique, enabling you to tackle your own data analysis challenges with confidence.

Read on to see how you can model a quadratic relationship between one independent variable (or multiple independent variables) and the dependent variable in lm().

Comments closed

New Features in healthyR.ts 0.3

Published 2023-11-20 by Kevin Feasel

Steven Sanderson lays out some updates:

One of the standout additions is the introduction of util_log_ts(). This function seems like a game-changer, providing a streamlined way to log time series data. This is incredibly useful, especially when dealing with extensive datasets, making the whole process more efficient and user-friendly. This is a helper function for auto_stationarize().

There’s a lot in this update and the blog post also includes several examples of automating stationarity and ARIMA.

Comments closed

Multiple Linear Regression in R

Published 2023-11-16 by Kevin Feasel

Steven Sanderson isn’t content with regressing against a single variable:

Multiple linear regression is a powerful statistical method that allows us to examine the relationship between a dependent variable and multiple independent variables.

Read on to see how you can do this in R, as well as some of the types of things you should think about along the way, including the concept of multicollinearity.

Comments closed

Creating Prediction Intervals in R

Published 2023-11-14 by Kevin Feasel

Steven Sanderson builds a prediction interval:

Prediction intervals are a powerful tool for understanding the uncertainty of your predictions. They allow you to specify a range of values within which you are confident that the true value will fall. This can be useful for many tasks, such as setting realistic goals, making informed decisions, and communicating your findings to others.

In this blog post, we will show you how to create a prediction interval in R using the mtcars dataset. The mtcars dataset is a built-in dataset in R that contains information about fuel economy, weight, displacement, and other characteristics of 32 cars.

Click through to see an example based on linear regression.

Comments closed

Comparing Permutation SHAP and Kernel SHAP

Published 2023-11-13 by Kevin Feasel

Michael Mayer lays some groundwork:

SHAP is the predominant way to interpret black-box ML models, especially for tree-based models with the blazingly fast TreeSHAP algorithm.

For general models, two slower SHAP algorithms exist:

Permutation SHAP (Štrumbelj and Kononenko, 2010)

Kernel SHAP (Lundberg and Lee, 2017)

Read on to understand more about these two forms of SHAP, as well as how they compare in two datasets of differing levels of difficulty.

Comments closed

Converting Data to Time Series in R with TidyDensity

Published 2023-11-08 by Kevin Feasel

Steven Sanderson shows off a new function:

If you’re an R enthusiast like me, you know that data manipulation is at the core of everything we do. The ability to transform your data swiftly and efficiently can make or break your data analysis projects. That’s why I’m thrilled to introduce a game-changing function in TidyDensity, my very own R library. Say hello to convert_to_ts()!

In the world of data analysis, time series data is like a treasure chest of insights waiting to be unlocked. Whether you’re tracking stock prices, monitoring patient data, or analyzing the temperature over the years, having your data in a time series format is a crucial step in the process. With convert_to_ts(), that process just got a whole lot easier.

Click through to see how it works and what you can do with it.

Comments closed

Fitting Distributions to Datasets in R

Published 2023-11-06 by Kevin Feasel

Steven Sanderson tests a distribution fit:

There are two main ways to fit a gamma distribution to a dataset in R:

Maximum likelihood estimation (MLE): This method estimates the parameters of the gamma distribution that are most likely to have produced the observed data.

Method of moments: This method estimates the parameters of the gamma distribution by equating the sample mean and variance to the theoretical mean and variance of the gamma distribution.

Click through to see which technique Steven uses and an example of how it all works.

Comments closed

From Probabilities to Odds

Published 2023-11-06 by Kevin Feasel

Bryan Shalloway explains how odds and probabilities intertwine:

However human understanding of odds predates our formal understanding of probability. You can find references to odds dating back to Shakespeare:

Knew that we ventured on such dangerous seas
That if we wrought out life ’twas ten to one;
– Shakespeare’s Henry IV, Part II, 1597

Yet, in most common settings, modern society has largely supplanted odds for probabilities. You can imagine if Shakespeare were writing today the line might end “’twas ten out of eleven.”

Read the whole thing.

Comments closed

Triangular Distributions and R

Published 2023-11-03 by Kevin Feasel

Steven Sanderson unleashes the power of the triangle:

The Triangular Distribution is a continuous probability distribution with a triangular shape, hence the name. It is defined by three parameters: min, max, and mode. These parameters determine the range of values the distribution can take and the most likely value within that range. In mathematical terms, the probability density function (PDF) of the Triangular Distribution is given by:

Read on to see the definition, as well as how you can use the four functions around the Triangular Distribution.

By the way, the best-known case of the Triangular Distribution is combining the results of two fair dice, which gives us a peak at the number 7 (1/6 of the time) for a pair of fair, six-sided dice and moving symmetrically down from there, so p(6) = p(8), p(5) = p(9), and so on.

Comments closed

Category: Data Science