Press "Enter" to skip to content

Category: Visualization

Faceted Images in ggplot2

Steven Sanderson shows multiple plots on one image:

Data visualization is a crucial tool in the data scientist’s toolkit. It allows us to explore and communicate complex patterns and insights effectively. In the world of R programming, one of the most powerful and versatile packages for data visualization is ggplot2. Among its many features, ggplot2 offers the facet_grid() function, which enables you to create multiple plots arranged in a grid, making it easier to visualize different groups of data simultaneously.

In this blog post, we’ll dive into the fascinating world of facet_grid() using a practical example. We’ll generate some synthetic data, split it into multiple groups, and then use facet_grid() to create a visually appealing grid of plots.

Read on for the demo script. The text talks about facet_grid() and the demo is facet_wrap(). The two behave very similarly, though they have slightly different use cases.

Comments closed

Visualizing Data in R with ggplot2

Adrian Tam continues a series on R:

One of the most popular plotting libraries in R is not the plotting function in R base, but the ggplot2 library. People use that because it is flexible. This library also works using the philosophy of “grammar of graphics”, which is not to generate a visualization upon a function call, but to define what should be in the plot, and you can refine it further before setting it into a picture. In this post, you will learn about ggplot2 and see some examples. In particular, you will learn:

  • How to make use of ggplot2 to create a plot from a dataset
  • How to create various charts and graphics with multiple facades using ggplot2

It takes a little while to understand the grammar of graphics approach that ggplot2 takes, but once you do, you realize just how good this library is for generating static images.

Comments closed

Pairs Plots in Base R

Steven Sanderson shows how we can create a pairs plot using the pairs() function in R:

A pairs plot, also known as a scatterplot matrix, is a grid of scatterplots that displays pairwise relationships between multiple variables in a dataset. Each cell in the grid represents the relationship between two variables, and the diagonal cells display histograms or kernel density plots of individual variables. Pairs plots are incredibly versatile, helping us to identify patterns, correlations, and potential outliers in our data.

Click through for one example, how to interpret it, and how to customize the outputs.

Comments closed

ggplot2 in Python Notebooks

John Mount runs R in Python with rpy2:

For an article on A/B testing that I am preparing, I asked my partner Dr. Nina Zumel if she could do me a favor and write some code to produce the diagrams. She prepared an excellent parameterized diagram generator. However being the author of the book Practical Data Science with R, she built it in R using ggplot2. This would be great, except the A/B testing article is being developed in Python, as it targets programmers familiar with Python.

As the production of the diagrams is not part of the proposed article, I decided to use the rpy2 package to integrate the R diagrams directly into the new worksheet. Alternatively, I could translate her code into Python using one of: Seaborn objectsplotnineggpy, or others. The large number of options is evidence of how influential Leland Wilkinson’s grammar of graphics (gg) is.

Click through to see how you can execute R code within the context of Python, similar to how you can use the reticulate package to execute Python code in the context of R.

Comments closed

Creating Confidence Intervals on a Linear Model in R

Steven Sanderson goes frequentist on us:

Linear regression is a fundamental statistical technique used to model the relationship between a dependent variable and one or more independent variables. While fitting a linear model is relatively straightforward in R, it’s also essential to understand the uncertainty associated with our model’s predictions. One way to visualize this uncertainty is by creating confidence intervals around the regression line. In this blog post, we’ll walk through how to perform linear regression and plot confidence intervals using base R with the popular Iris dataset.

Click through to see how, even if you’re a Bayesian who considers confidence intervals to overstate precision in reality.

Comments closed

Dynamic Highlighting of Data Points Based on Slicer Selection

Nikola Ilic shines a light on the data:

To quickly explain: when a user selects, for example, Contoso in the slicer, the Contoso bar should be highlighted by using a different color. As much as this sounds like a very basic and common business request, there is no straightforward solution in Power BI (or, at least, I’m not aware of it:))

However, the client’s wish is (almost) always our command – so, let’s see how this feature can be implemented with a little bit of data model tweaking and leveraging some DAX code.

Spoilers: there is a solution, though it does involve quite a few steps.

Comments closed

Grouped Scatter Plots in R

Steven Sanderson builds a scatter plot:

Data visualization is a powerful tool for gaining insights from your data. Scatter plots, in particular, are excellent for visualizing relationships between two continuous variables. But what if you want to compare multiple groups within your data? In this blog post, we’ll explore how to create engaging scatter plots by group in R. We’ll walk through the process step by step, providing several examples and explaining the code blocks in simple terms. So, whether you’re a data scientist, analyst, or just curious about R, let’s dive in and discover how to make your data come to life!

Click through for several examples of plot generation.

Comments closed

An Overview of Sankey Diagrams

Simon Rowe explains what a Sankey diagram is:

In the image, you can see how Sankey used arrows to show the flow of energy with the widths of the shaded areas proportional to the amount of heat loss as it progresses through the engine’s cycle. This series of complex relationships would be difficult for a reader to understand at a glance were they simply presented in text and data tables. Making just such a sophisticated system easier to understand is the purpose of a Sankey diagram, which visually summarises the volume and direction of flows through the stages of a process or system.

Click through for several good examples, some advice on when a Sankey diagram could make sense, and the times in which you should not use this visual.

Comments closed

Working with Histogram Breaks in R

Steven Sanderson divvies out buckets for a histogram:

Histograms divide data into bins, or intervals, and then count how many data points fall into each bin. The breaks parameter in R allows you to control how these bins are defined. By specifying breaks thoughtfully, you can highlight specific patterns and nuances in your data.

Click through to see how you can use the breaks parameter in a few different ways to customize your histogram. The default breaks in R are often reasonable, but trying a few different breaks can help you get a better understanding of the actual distribution of the data.

Comments closed

Multivariate Histograms in R

Steven Sanderson wants multiple breakdowns:

Histograms are powerful tools for visualizing the distribution of a single variable, but what if you want to compare the distributions of two variables side by side? In this blog post, we’ll explore how to create a histogram of two variables in R, a popular programming language for data analysis and visualization.

We’ll cover various scenarios, from basic histograms to more advanced techniques, and explain the code step by step in simple terms. So, grab your favorite dataset or generate some random data, and let’s dive into the world of dual-variable histograms!

Click through for several techniques.

Comments closed