Press "Enter" to skip to content

Category: R

Building a Bland-Altman Plot in R

Steven Sanderson performs a comparison:

Before we dive into the code, let’s briefly understand what a Bland-Altman plot is. It’s a graphical method to visualize the agreement between two measurement techniques, often used in fields like medicine or any domain with comparative measurements. The plot displays the differences between two measurements (Y-axis) against their means (X-axis).

Click through to see how this works and how you can interpret the results.

Comments closed

Functional Programming and R

Anirban Shaw ties functional programming to R:

Functional Programming‘s relevance in the R programming language, a language primarily known for its prowess in data analysis and statistical computing, is particularly noteworthy. By leveraging functional programming, organizations can improve operational efficiency and gain a competitive edge

R’s ecosystem is enriched by functional programming paradigms, which enable developers and data scientists to write concise and expressive code for tasks such as data manipulation, transformation, and visualization.

In this article, we take a deep dive into the fundamental characteristics of R, the advantages of adopting functional programming within it and the essential concepts ingrained in the core of R. 

Read on to see how the two fit together. H/T R-Bloggers.

Comments closed

Scree Plots in R

Steven Sanderson builds a scree plot:

A scree plot is a line plot that shows the eigenvalues or variance explained by each principal component (PC) in a Principal Component Analysis (PCA). It is a useful tool for determining the number of PCs to retain in a PCA model.

In this blog post, we will show you how to create a scree plot in base R. We will use the iris dataset as an example.

Read on to learn more about the plot, as well as examples of how to create scree plots.

Comments closed

Bubble Charts in ggplot2

Steven Sanderson creates a bubble chart:

Bubble charts are a great way to visualize data with three dimensions. The size of the bubbles represents a third variable, which can be used to show the importance of that variable or to identify relationships between the three variables.

To create a bubble chart in R using ggplot2, you will need to use the geom_point() function. This function will plot points on your chart, and you can use the size aesthetic to control the size of the points.

Click through for two examples, one which is a pretty good outcome for using a bubble chart, and one which exposes the key weakness of bubble charts.

Comments closed

Several Useful R Functions

Maelle Salmon shows off four useful R functions:

Recently I caught myself using which(grepl(...)),

animals <- c("cat", "bird", "dog", "fish")
which(grepl("i", animals))
#> [1] 2 4

when the simpler alternative is

animals <- c("cat", "bird", "dog", "fish")
grep("i", animals)
#> [1] 2 4

Read on for another example of using grep() instead of grepl(), as well as three other functions you might want to keep in mind. H/T R-Bloggers.

Comments closed

Creating Pareto Charts in R with qcc

Steven Sanderson builds a Pareto chart:

A Pareto chart is a type of bar chart that shows the frequency of different categories in a dataset, ordered by frequency from highest to lowest. It is often used to identify the most common problems or causes of a problem, so that resources can be focused on addressing them.

To create a Pareto chart in R, we can use the qcc package. The qcc package provides a number of functions for quality control, including the pareto.chart() function for creating Pareto charts.

Manufacturing companies love Pareto charts

Comments closed

Exploring Poker Hands in R

Benjamin Smith sorts and deals:

Recently, I have been reading “Mathematical Statistics” by Professor Keith Knight and I noticed a interesting passage he mentions when discussing finite sample spaces:

*In some cases, it may be possible to enumerate all possible outcomes, but in general such enumeration is physically impossible; for example, enumerating all possible 5 card poker hands dealt from a deck of 52 cards would take several months under the most
favourable conditions. * (Knight 2000)

While this quote is taken out of context, with the advent of modern computing this is a task which is definitely possible to do computationally!

Click through to see how you can do this in R, at least for 5-card stud. 5-card draw would have the same number of final combinations, though if you also track intermediary combinations, it would grow rather considerably.

Comments closed

Making a Time Series Stationary in R

Steven Sanderson puts a halt to things:

When working with time series data, one common challenge is dealing with non-stationary data. Non-stationary time series can be a headache for analysts, but fear not, because we have a handy tool to make your life easier. Say hello to the auto_stationarize() function from the {healthyR.ts} package.

Read on to learn why you want stationary data for time series analysis and how the auto_stationarize() function works.

Comments closed

Time Series Stationarity Testing in R

Steven Sanderson isn’t just spinning in place:

Before we delve into the ts_adf_test() function, let’s understand the concept behind it. The Augmented Dickey-Fuller (ADF) test is a crucial tool in time series analysis. It’s like the Sherlock Holmes of time series data, helping us detect whether a series is stationary or not. Stationarity is a fundamental assumption in time series modeling because many models work best when applied to stationary data.

So, why “Augmented”? Well, it’s an extension of the original Dickey-Fuller test that accounts for more complex relationships within the time series data.

Click through to see how you can use the ts_adf_test() function to get a better feel for whether a time series is stationary.

Comments closed

New R Package: hstats

Michael Mayer has a new package:

The current version offers:

  • H statistics per feature, feature pair, and feature triple
  • multivariate predictions at no additional cost
  • a convenient API
  • other important tools from explainable ML:
    • performance calculations
    • permutation importance (e.g., to select features for calculating H-statistics)
    • partial dependence plots (including grouping, multivariate, multivariable)
    • individual conditional expectations (ICE)
  • Case-weights are available for all methods, which is important, e.g., in insurance applications.

Click through for an example of how it works, followed by some simple benchmarking to give you an idea of how it performs compared to similar tools.

Comments closed