R – Page 21 – Curated SQL

The Triangular Distribution in TidyDensity

Published 2024-01-11 by Kevin Feasel

Steven Sanderson unleashes the power of the triangle:

Welcome back, fellow data enthusiasts! Today, we embark on an exciting journey into the world of statistical distributions with a special focus on the latest addition to the TidyDensity package – the triangular distribution. Tightly packed and versatile, this distribution brings a unique flavor to your data simulations and analyses. In this blog post, we’ll delve into the functions provided, understand their arguments, and explore the wonders of the triangular distribution.

Read on to learn what the triangular distribution is and how you can use work with it in TidyDensity.

Comments closed

TidyDensity 1.3.0 Released

Published 2024-01-10 by Kevin Feasel

Steven Sanderson has an update to the TidyDensity package:

The latest release of the TidyDensity R package brings some major changes and improvements that open up new possibilities for statistical analysis and data visualization. Version 1.3.0 includes breaking changes, new features, and a host of minor fixes and improvements that enhance performance and usability. Let’s dive into what’s new!

Read on for that change list and how you can get a copy of the TidyDensity R package.

Comments closed

Aggregating by Month and Year in R

Published 2024-01-09 by Kevin Feasel

Steven Sanderson groups by month and year:

Taming the beast of daily data can be daunting. While it captures every detail, sometimes you need a bird’s-eye view. Enter aggregation, your secret weapon for transforming daily data into monthly and yearly insights. In this post, we’ll dive into the world of R, where you’ll wield powerful tools like dplyr and lubridate to master this data wrangling art.

Click through for examples of summarizing daily data into monthly and annual data. One thing to keep in mind, however, is that the monthly aggregation in these examples is just month, so if you have July 2023 and July 2024 data, you’ll get a row back for July. It’s all about understanding what the grain of your data is, as well as your desired grain.

Comments closed

Explaining Models with Classic Methods and SHAP

Published 2024-01-08 by Kevin Feasel

Michael Mayer has some ‘splainin to do:

Let’s explain a {tidymodels} random forest by classic explainability methods (permutation importance, partial dependence plots (PDP), Friedman’s H statistics), and also fancy SHAP.

Disclaimer: {hstats}, {kernelshap} and {shapviz} are three of my own packages.

What I really appreciate in here is that Michael includes classic methods here. It can be easy to say “Oh, this is old and therefore no longer relevant.” But that would be quite wrong.

Comments closed

Rolling Averages in R

Published 2024-01-08 by Kevin Feasel

Steven Sanderson performs a moving average:

Ever felt those data points were a bit too jittery? Smoothing out trends and revealing underlying patterns is a breeze with rolling averages in R. Ready to roll? Let’s dive in!

Read on to see one way to do this in R.

Comments closed

LOWESS Smoothing in R

Published 2024-01-05 by Kevin Feasel

Steven Sanderson had me thinking of LOESS but then, bam!, snuck this in on me:

Locally Weighted Scatterplot Smoothing, or Lowess, is a powerful technique for capturing trends in noisy data. It’s particularly useful when dealing with datasets that exhibit complex patterns that might be missed by other methods. So, let’s get our hands dirty and start coding!

Read on for an example of LOWESS smoothing, which actually is a little different from LOESS. If you’re interested in learning more about the differences between LOESS and LOWESS, this Stack Exchange question and answer page is really good.

Comments closed

Working with .I in data.table

Published 2024-01-04 by Kevin Feasel

John MacKintosh has eyes for .I:

In this post I’m using a small extract from the SIMD2020 dataset to figure out what the special operator .I does.

Files and code are on github if you’re interested

Click through for a demo-laden post. H/T R-Bloggers.

Comments closed

From Data Frame to Time Series in R

Published 2024-01-03 by Kevin Feasel

Steven Sanderson takes us through time series creation:

R offers two primary time series classes:

“ts”: Base R’s classic time series object, designed for regularly spaced data.

“xts”: Part of the ‘xts’ package, offering enhanced flexibility and features.

Click through for examples using both of these methods.

Comments closed

Quantile Regression using Random Forests

Published 2024-01-03 by Kevin Feasel

Norm Matloff answers a reader question:

In my December 22 blog, I first introduced the classic parametric quantile regression (QR) concept. I then showed how one could use the qeML package to perform quantile regression nonparametrically, using the package’s qeKNN function for a k-Nearest Neighbors approach. A reader then asked if this could be applied to random forests (RFs). The answer is yes, and this will be the topic of the current post.

Read on to learn more about how to do this, including some of the challenges you’ll face along the way. H/T R-Bloggers.

Comments closed

Reversion to the Mean

Published 2024-01-02 by Kevin Feasel

Holger von Jouanne-Diedrich explains an important statistical concept we all too often forget:

In the realm of business and leadership, one statistical phenomenon often goes unrecognized yet significantly influences our understanding of performance and success. This is the concept of reversion to the mean (also called regression to the mean). This seemingly simple statistical occurrence can profoundly impact how we perceive management strategies, leadership effectiveness, and even the fate of those gracing the covers of prominent magazines. To understand what is going on, read on!

Read on for a video in German and an article in English, with some bonus R code to sell the story.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: R