Category: Visualization

Pre-Sketching Data Visualizations

Laura Ellis explains the benefits of pre-sketching data visualizations:

When you take on a new data visualization project, it can be tempting to jump in and create visualizations right away with the idea that after enough exploring, the final format will present itself to you. And while it is important to dedicate time to EDA (exploratory data analysis), it can also be very beneficial to define a high-level plan early in the process.

Over time, I’ve found that producing an early sketch has been helpful in reducing the total amount of time and iterations taken towards building the end product.

Read on for the reasons why.

Color and Emotion

Cedric Scherer explains some of the psychology behind understanding of color in visuals:

Without any intention, the two variations of my visualization triggered different emotional reactions. While the red chart likely leads you to think “Wow, Berlin summers are quite hot,” the blue version may push you to think of summers as rainy and rather cold.

In general, we should have in mind that different details might spark different emotions and expectations in our viewers. Some of these details will make it easier for them to understand the chart in the manner the designer intended. 

Having experienced (parts of) a Berlin summer, let me confirm that they are not hot when compared to the midwestern or southeastern US.

Visualizing with Text

Alex Velez shows how you can use simple text to share information:

If you’re unsure about what I mean when I say simple text, I’m referring to the idea that just because you have numbers doesn’t mean you need to build a graph—as in this simple text example. Sometimes words with big numbers written in bold fonts (aka BANs) are more effective, especially when communicating one or two data points.

Some people’s hesitancy with simple text is that they think it’s unrealistic: the notion that you might only share one or two numbers with an audience. It’s is a fair point; it does seem a bit silly to think that you’d only talk through a couple of specific values when presenting data—even though I’m sure there are such occasions. Rather than thinking of the ideal use-case for simple text as when you only have one or two numbers in totality, consider when it may make sense to draw attention to one or two numbers in your larger story.

This is a good reminder that you don’t need everything to be fancy, shiny, and visual-laden. A little bit of text can go a long way in laying out a visual. That said, the warning is that text seems to be a little easier for people to miss, especially if there’s a lot of it. That’s where Alex’s explanation really pays off.

Running Dask on AKS

Tsuyoshi Matsuzaki sets up Dask as a distributed service:

In my last post, I showed you tutorial for running Apache Spark on managed kubernetes, Azure Kubernetes Service (AKS).
In this post, I’ll show you the tutorial for running distributed workloads of Dask on AKS.

By using Dask, you can run Scikit-Learn compliant functions and jobs for data which cannot fit in memory, or run in distributed manners. For simplicity, here I’ll use built-in Dask ML function (dask_ml.linear_model.LinearRegression) in this tutorial. (With the same manners, you can also run regular sklearn functions.)
Cloud managed kubernetes will make you speed up this large ML workloads.

Click through for the process. I’ve had some positive experiences with Dask as a dashboarding tool. It’s definitely one of the better ones if you’re big into Python.

The Value of Bubble Charts

Elizabeth Ricks takes us through a surprisingly tricky chart:

An extension of a scatterplot, a bubble chart is commonly used to visualize relationships between three or more numeric variables.  Each bubble in a chart represents a single data point. The values for each bubble are encoded by 1) its horizontal position on the x-axis, 2) its vertical position on the y-axis, and 3) the size of the bubble. Sometimes, the color of the bubble or its movement in animation can represent more dimensions. 

I say surprisingly tricky because it’s easy to overwhelm the user when trying to view bubble charts. I think the best scenarios are cases in which you have relatively few data points and the size element is mandatory.

Hans Rosling (RIP) did an outstanding job of displaying this kind of chart with the Gapminder dataset.

Preventing Calendar Overrun in Power BI

Matt Allington updates an older article:

Consider the example below where the CalendarYear is filtered for 2019 and the values of the measures Total Sales and Total Sales YTD are displayed by month. As you can see, the total sales are shown up to July 2019. This is because with the sample data, the last sales date is somewhere in July 2019. However, the values of Total Sales YTD are repeated all the way until the end of year (July 2019 to December 2019). This is what I call Calendar Over Run. It is common to want to prevent this overrun.

Read on for two separate methods of preventing this visual issue.

Plotting Correlation Analyses in R

Finnstats shows a few techniques for plotting correlation in R:

Correlation analysis, correlation is a term that is a measure of the strength of a relationship between two variables.

Pearson’s Product-Moment Correlation

One of the most common measures of correlation is Pearson’s product-moment correlation, which is commonly referred to simply as the correlation, or just the letter r.

Correlation shows the strength of a relationship between two variables and is expressed numerically by the correlation coefficient.

Click through for examples from several packages. H/T R-Bloggers.

Table Design in R with mmtable2

Matt Dancho walks through a package to make tables look great in R:

I love ggplot2 for plotting. The grammar of graphics allows us to add elements to plots. Tables seem to be forgotten in terms of an intuitive grammar with tidy data philosophy – Until now. mmtable2 aims to be the ggplot2 for tables, leveraging the awesome GT table package.

The mmtable2 package aims to make it easy to create tables by:

1. Using a ggplot2-style syntax for using a grammar of table operations.

2. Extends the amazing GT table package.

Read on for the process and a demonstration.

Plotting XGBoost Trees with R

Andrew Treadway shows off a method to visualize the results of training an XGBoost model:

In this post, we’re going to cover how to plot XGBoost trees in R. XGBoost is a very popular machine learning algorithm, which is frequently used in Kaggle competitions and has many practical use cases.

Let’s start by loading the packages we’ll need. Note that plotting XGBoost trees requires the DiagrammeR package to be installed, so even if you have xgboost installed already, you’ll need to make sure you have DiagrammeR also.

Click through for the process. H/T R-Bloggers.

