Press "Enter" to skip to content

Category: Visualization

Grouped Scatter Plots in R

Steven Sanderson builds a scatter plot:

Data visualization is a powerful tool for gaining insights from your data. Scatter plots, in particular, are excellent for visualizing relationships between two continuous variables. But what if you want to compare multiple groups within your data? In this blog post, we’ll explore how to create engaging scatter plots by group in R. We’ll walk through the process step by step, providing several examples and explaining the code blocks in simple terms. So, whether you’re a data scientist, analyst, or just curious about R, let’s dive in and discover how to make your data come to life!

Click through for several examples of plot generation.

Comments closed

An Overview of Sankey Diagrams

Simon Rowe explains what a Sankey diagram is:

In the image, you can see how Sankey used arrows to show the flow of energy with the widths of the shaded areas proportional to the amount of heat loss as it progresses through the engine’s cycle. This series of complex relationships would be difficult for a reader to understand at a glance were they simply presented in text and data tables. Making just such a sophisticated system easier to understand is the purpose of a Sankey diagram, which visually summarises the volume and direction of flows through the stages of a process or system.

Click through for several good examples, some advice on when a Sankey diagram could make sense, and the times in which you should not use this visual.

Comments closed

Working with Histogram Breaks in R

Steven Sanderson divvies out buckets for a histogram:

Histograms divide data into bins, or intervals, and then count how many data points fall into each bin. The breaks parameter in R allows you to control how these bins are defined. By specifying breaks thoughtfully, you can highlight specific patterns and nuances in your data.

Click through to see how you can use the breaks parameter in a few different ways to customize your histogram. The default breaks in R are often reasonable, but trying a few different breaks can help you get a better understanding of the actual distribution of the data.

Comments closed

Multivariate Histograms in R

Steven Sanderson wants multiple breakdowns:

Histograms are powerful tools for visualizing the distribution of a single variable, but what if you want to compare the distributions of two variables side by side? In this blog post, we’ll explore how to create a histogram of two variables in R, a popular programming language for data analysis and visualization.

We’ll cover various scenarios, from basic histograms to more advanced techniques, and explain the code step by step in simple terms. So, grab your favorite dataset or generate some random data, and let’s dive into the world of dual-variable histograms!

Click through for several techniques.

Comments closed

Multi-Plot Graphs in R

Steven Sanderson needs more than one line:

Data visualization is a crucial aspect of data analysis. In R, the flexibility and power of its plotting capabilities allow you to create compelling visualizations. One common scenario is the need to display multiple plots on the same graph. In this blog post, we’ll explore three different approaches to achieve this using the same dataset. We’ll use the set.seed(123) and generate data with x and y equal to cumsum(rnorm(25)) for consistency across examples.

Click through for three common techniques.

Comments closed

Plotting SVM Decision Boundaries in R

Steven Sanderson goes right up to the edge:

Support Vector Machines (SVM) are a powerful tool in the world of machine learning and classification. They excel in finding the optimal decision boundary between different classes of data. However, understanding and visualizing these decision boundaries can be a bit tricky. In this blog post, we’ll explore how to plot an SVM object using the e1071 library in R, making it easier to grasp the magic happening under the hood.

Read on to see how you can perform this analysis as well.

Comments closed

Appropriate Uses of Jitter in Graphs

Steven Sanderson shakes things up:

As an R programmer, one of the most useful functions to know is the jitter function. The jitter function is used to add random noise to a numeric vector, which can be helpful when visualizing data in a scatterplot. By using the jitter function, we can get a better picture of the true underlying relationship between two variables in a dataset.

Read on to get an idea of how to use jitter, though I recommend making it very clear to chart viewers that you are, in fact, using jitter, as it can be easy to misinterpret the jitter as actual value locations.

Comments closed

Kernel Density Plots in R

Steven Sanderson explains one common type of plot in R:

Kernel Density Plots are a type of plot that displays the distribution of values in a dataset using one continuous curve. They are similar to histograms, but they are even better at displaying the shape of a distribution since they aren’t affected by the number of bins used in the histogram. In this blog post, we will discuss what Kernel Density Plots are in simple terms, what they are useful for, and show several examples using both base R and ggplot2.

Read on to learn more, including how to generate these in base R, ggplot2, and with the tidy_density package.

Comments closed

The Value of KPIs and Cards in Power BI

Kurt Buhler and Stepan Resl give you a card:

When a user arrives at your report, they should be able to answer their most important questions in a few seconds. To do this, we typically put the most critical information in the top-left of the report (where we often look first). This information should provide a high-level overview, whereas additional details should be placed at the bottom of the report, behind interactions, or on later pages.

An effective and popular way to call attention to important numbers in Power BI is by using cards and KPI core visuals.

Read on for several examples and a breakdown of how they work best.

Comments closed