Category: R

Bubble Charts in ggplot2

Published 2023-10-24 by Kevin Feasel

Steven Sanderson creates a bubble chart:

Bubble charts are a great way to visualize data with three dimensions. The size of the bubbles represents a third variable, which can be used to show the importance of that variable or to identify relationships between the three variables.

To create a bubble chart in R using ggplot2, you will need to use the geom_point() function. This function will plot points on your chart, and you can use the size aesthetic to control the size of the points.

Click through for two examples, one which is a pretty good outcome for using a bubble chart, and one which exposes the key weakness of bubble charts.

Comments closed

Several Useful R Functions

Published 2023-10-23 by Kevin Feasel

Maelle Salmon shows off four useful R functions:

Recently I caught myself using which(grepl(...)),
animals <- c("cat", "bird", "dog", "fish")
which(grepl("i", animals))
#> [1] 2 4
when the simpler alternative is
animals <- c("cat", "bird", "dog", "fish")
grep("i", animals)
#> [1] 2 4

Read on for another example of using grep() instead of grepl(), as well as three other functions you might want to keep in mind. H/T R-Bloggers.

Comments closed

Creating Pareto Charts in R with qcc

Published 2023-10-23 by Kevin Feasel

Steven Sanderson builds a Pareto chart:

A Pareto chart is a type of bar chart that shows the frequency of different categories in a dataset, ordered by frequency from highest to lowest. It is often used to identify the most common problems or causes of a problem, so that resources can be focused on addressing them.

To create a Pareto chart in R, we can use the qcc package. The qcc package provides a number of functions for quality control, including the pareto.chart() function for creating Pareto charts.

Manufacturing companies love Pareto charts

Comments closed

Exploring Poker Hands in R

Published 2023-10-23 by Kevin Feasel

Benjamin Smith sorts and deals:

Recently, I have been reading “Mathematical Statistics” by Professor Keith Knight and I noticed a interesting passage he mentions when discussing finite sample spaces:

*In some cases, it may be possible to enumerate all possible outcomes, but in general such enumeration is physically impossible; for example, enumerating all possible 5 card poker hands dealt from a deck of 52 cards would take several months under the most
favourable conditions. * (Knight 2000)

While this quote is taken out of context, with the advent of modern computing this is a task which is definitely possible to do computationally!

Click through to see how you can do this in R, at least for 5-card stud. 5-card draw would have the same number of final combinations, though if you also track intermediary combinations, it would grow rather considerably.

Comments closed

Making a Time Series Stationary in R

Published 2023-10-20 by Kevin Feasel

Steven Sanderson puts a halt to things:

When working with time series data, one common challenge is dealing with non-stationary data. Non-stationary time series can be a headache for analysts, but fear not, because we have a handy tool to make your life easier. Say hello to the auto_stationarize() function from the {healthyR.ts} package.

Read on to learn why you want stationary data for time series analysis and how the auto_stationarize() function works.

Comments closed

Time Series Stationarity Testing in R

Published 2023-10-18 by Kevin Feasel

Steven Sanderson isn’t just spinning in place:

Before we delve into the ts_adf_test() function, let’s understand the concept behind it. The Augmented Dickey-Fuller (ADF) test is a crucial tool in time series analysis. It’s like the Sherlock Holmes of time series data, helping us detect whether a series is stationary or not. Stationarity is a fundamental assumption in time series modeling because many models work best when applied to stationary data.

So, why “Augmented”? Well, it’s an extension of the original Dickey-Fuller test that accounts for more complex relationships within the time series data.

Click through to see how you can use the ts_adf_test() function to get a better feel for whether a time series is stationary.

Comments closed

New R Package: hstats

Published 2023-10-17 by Kevin Feasel

Michael Mayer has a new package:

The current version offers:

H statistics per feature, feature pair, and feature triple

multivariate predictions at no additional cost

a convenient API

other important tools from explainable ML:

performance calculations

permutation importance (e.g., to select features for calculating H-statistics)

partial dependence plots (including grouping, multivariate, multivariable)

individual conditional expectations (ICE)

Case-weights are available for all methods, which is important, e.g., in insurance applications.

Click through for an example of how it works, followed by some simple benchmarking to give you an idea of how it performs compared to similar tools.

Comments closed

Creating Horizontal Legends in R

Published 2023-10-13 by Kevin Feasel

Steven Sanderson flattens the legend:

Creating a horizontal legend in base R can be a useful skill when you want to label multiple categories in a plot without taking up too much vertical space. In this blog post, we’ll explore various methods to create horizontal legends in R and provide examples with clear explanations.

Read on for two demos, one with a single legend and one which creates two legends. I’m not so sure about how valuable the latter is (because you’re splitting valuable information into two places, losing some of the glanceability of a chart along the way), but it is interesting that you can do it.

Comments closed

Changing the Style of a Legend in R

Published 2023-10-11 by Kevin Feasel

Steven Sanderson is a legend:

Before diving into code examples, let’s understand the basics. In R, legends are essential for explaining the meaning of different elements in your plot, such as colors, lines, or shapes. Legends help your audience interpret the data effectively.

In most cases, R’s base plotting system provides you with control over the legend’s size. The key functions we’ll explore are legend() and guides(). We’ll also delve into how to modify legend size in popular plotting packages like ggplot2.

Click through for those demonstrations.

Comments closed

Bionic Reading in R

Published 2023-10-10 by Kevin Feasel

Tomaz Kastrun says reading is fundamental:

Trick your brain into faster reading with the help of Bionic Reading. With the help of highlighting part of the words, it “guides your eyes over the text and the brain remembers previously learned words more quickly.” (source: br-about)

Here is a beautiful example of how text with the use of opacity, colours, size and many other elements can be quickly achieved for faster reading.

Click through for an example and how to implement it in R.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31