Press "Enter" to skip to content

Category: Visualization

Simulating a Bivariate Normal Distribution in R

Steven Sanderson isn’t content with a univariate normal distribution:

Imagine two variables, like height and weight, that exhibit a joint distribution. The bivariate normal distribution captures the relationship between these variables, describing how their values tend to cluster around certain means and how they vary together. It’s like a two-dimensional bell curve, where the peak represents the most likely combination of values for both variables.

Click through to learn a bit more about bivariate normal distributions, including ways to plot one and show its density function.

Comments closed

Plotting a Cumulative Distribution Function in R

Steven Sanderson builds a plot:

Before delving into the world of R programming, let’s first grasp the fundamental concept of a CDF. Imagine a group of students eagerly awaiting their exam results. The CDF for their scores would depict the probability of encountering a student with a score less than or equal to a specific value. For instance, if the CDF indicates a value of 0.7 at 80%, it implies that there’s a 70% chance of finding a student with a score of 80 or lower.

Read on to see how you can calculate this in a dataset and then plot the CDF.

Comments closed

Log-Log Plots in R

Steven Sanderson thinks in percentages:

A log-log plot is a type of graph where both the x-axis and y-axis are in logarithmic scales. This is particularly useful when dealing with data that spans several orders of magnitude. By taking the logarithm of the data, we can compress large values and reveal patterns that might be hidden on a linear scale.

Let’s start with a simple example using base R.

Read on to see how you can create these plots and what you can do to customize them.

Comments closed

Building a Bland-Altman Plot in R

Steven Sanderson performs a comparison:

Before we dive into the code, let’s briefly understand what a Bland-Altman plot is. It’s a graphical method to visualize the agreement between two measurement techniques, often used in fields like medicine or any domain with comparative measurements. The plot displays the differences between two measurements (Y-axis) against their means (X-axis).

Click through to see how this works and how you can interpret the results.

Comments closed

Scree Plots in R

Steven Sanderson builds a scree plot:

A scree plot is a line plot that shows the eigenvalues or variance explained by each principal component (PC) in a Principal Component Analysis (PCA). It is a useful tool for determining the number of PCs to retain in a PCA model.

In this blog post, we will show you how to create a scree plot in base R. We will use the iris dataset as an example.

Read on to learn more about the plot, as well as examples of how to create scree plots.

Comments closed

Bubble Charts in ggplot2

Steven Sanderson creates a bubble chart:

Bubble charts are a great way to visualize data with three dimensions. The size of the bubbles represents a third variable, which can be used to show the importance of that variable or to identify relationships between the three variables.

To create a bubble chart in R using ggplot2, you will need to use the geom_point() function. This function will plot points on your chart, and you can use the size aesthetic to control the size of the points.

Click through for two examples, one which is a pretty good outcome for using a bubble chart, and one which exposes the key weakness of bubble charts.

Comments closed

Creating Pareto Charts in R with qcc

Steven Sanderson builds a Pareto chart:

A Pareto chart is a type of bar chart that shows the frequency of different categories in a dataset, ordered by frequency from highest to lowest. It is often used to identify the most common problems or causes of a problem, so that resources can be focused on addressing them.

To create a Pareto chart in R, we can use the qcc package. The qcc package provides a number of functions for quality control, including the pareto.chart() function for creating Pareto charts.

Manufacturing companies love Pareto charts

Comments closed

Creating Horizontal Legends in R

Steven Sanderson flattens the legend:

Creating a horizontal legend in base R can be a useful skill when you want to label multiple categories in a plot without taking up too much vertical space. In this blog post, we’ll explore various methods to create horizontal legends in R and provide examples with clear explanations.

Read on for two demos, one with a single legend and one which creates two legends. I’m not so sure about how valuable the latter is (because you’re splitting valuable information into two places, losing some of the glanceability of a chart along the way), but it is interesting that you can do it.

Comments closed

A Simple Chart Gone Wrong

Mike Cisneros debugs a visual:

Looking at the chart, you might think, “For such a simple chart, why am I so confused?” It’s a bar chart with one data series, a title, a subtitle, and more info at the bottom. But the real challenge isn’t the design, layout, or labeling. It’s the assumptions made before creating the chart. For example, the chart uses the acronym “ADS” without explaining it. Does everyone know it means Average Daily Sales? And does “Gap Analysis” make sense to coffee shop owners?

One additional thing that I would point out is that column charts tend to imply time series, which adds even more to the confusion, as the presentation makes it look like you’re seeing the comparison of Mellow Bean versus its competition over seven time periods. Bar charts, meanwhile, tend to imply static data, so the move to a bar chart makes sense.

Comments closed