Press "Enter" to skip to content

Category: Visualization

Color and Emotion

Cedric Scherer explains some of the psychology behind understanding of color in visuals:

Without any intention, the two variations of my visualization triggered different emotional reactions. While the red chart likely leads you to think “Wow, Berlin summers are quite hot,” the blue version may push you to think of summers as rainy and rather cold.

In general, we should have in mind that different details might spark different emotions and expectations in our viewers. Some of these details will make it easier for them to understand the chart in the manner the designer intended. 

Having experienced (parts of) a Berlin summer, let me confirm that they are not hot when compared to the midwestern or southeastern US.

Comments closed

Visualizing with Text

Alex Velez shows how you can use simple text to share information:

If you’re unsure about what I mean when I say simple text, I’m referring to the idea that just because you have numbers doesn’t mean you need to build a graph—as in this simple text example. Sometimes words with big numbers written in bold fonts (aka BANs) are more effective, especially when communicating one or two data points.

Some people’s hesitancy with simple text is that they think it’s unrealistic: the notion that you might only share one or two numbers with an audience. It’s is a fair point; it does seem a bit silly to think that you’d only talk through a couple of specific values when presenting data—even though I’m sure there are such occasions. Rather than thinking of the ideal use-case for simple text as when you only have one or two numbers in totality, consider when it may make sense to draw attention to one or two numbers in your larger story.

This is a good reminder that you don’t need everything to be fancy, shiny, and visual-laden. A little bit of text can go a long way in laying out a visual. That said, the warning is that text seems to be a little easier for people to miss, especially if there’s a lot of it. That’s where Alex’s explanation really pays off.

Comments closed

Running Dask on AKS

Tsuyoshi Matsuzaki sets up Dask as a distributed service:

In my last post, I showed you tutorial for running Apache Spark on managed kubernetes, Azure Kubernetes Service (AKS).
In this post, I’ll show you the tutorial for running distributed workloads of Dask on AKS.

By using Dask, you can run Scikit-Learn compliant functions and jobs for data which cannot fit in memory, or run in distributed manners. For simplicity, here I’ll use built-in Dask ML function (dask_ml.linear_model.LinearRegression) in this tutorial. (With the same manners, you can also run regular sklearn functions.)
Cloud managed kubernetes will make you speed up this large ML workloads.

Click through for the process. I’ve had some positive experiences with Dask as a dashboarding tool. It’s definitely one of the better ones if you’re big into Python.

Comments closed

The Value of Bubble Charts

Elizabeth Ricks takes us through a surprisingly tricky chart:

An extension of a scatterplot, a bubble chart is commonly used to visualize relationships between three or more numeric variables.  Each bubble in a chart represents a single data point. The values for each bubble are encoded by 1) its horizontal position on the x-axis, 2) its vertical position on the y-axis, and 3) the size of the bubble. Sometimes, the color of the bubble or its movement in animation can represent more dimensions. 

I say surprisingly tricky because it’s easy to overwhelm the user when trying to view bubble charts. I think the best scenarios are cases in which you have relatively few data points and the size element is mandatory.

Hans Rosling (RIP) did an outstanding job of displaying this kind of chart with the Gapminder dataset.

Comments closed

Preventing Calendar Overrun in Power BI

Matt Allington updates an older article:

Consider the example below where the CalendarYear is filtered for 2019 and the values of the measures Total Sales and Total Sales YTD are displayed by month. As you can see, the total sales are shown up to July 2019. This is because with the sample data, the last sales date is somewhere in July 2019. However, the values of Total Sales YTD are repeated all the way until the end of year (July 2019 to December 2019). This is what I call Calendar Over Run. It is common to want to prevent this overrun.

Read on for two separate methods of preventing this visual issue.

Comments closed

Plotting Correlation Analyses in R

Finnstats shows a few techniques for plotting correlation in R:

Correlation analysis, correlation is a term that is a measure of the strength of a relationship between two variables.

Pearson’s Product-Moment Correlation

One of the most common measures of correlation is Pearson’s product-moment correlation, which is commonly referred to simply as the correlation, or just the letter r.

Correlation shows the strength of a relationship between two variables and is expressed numerically by the correlation coefficient.

Click through for examples from several packages. H/T R-Bloggers.

Comments closed

Table Design in R with mmtable2

Matt Dancho walks through a package to make tables look great in R:

I love ggplot2 for plotting. The grammar of graphics allows us to add elements to plots. Tables seem to be forgotten in terms of an intuitive grammar with tidy data philosophy – Until now. mmtable2 aims to be the ggplot2 for tables, leveraging the awesome GT table package.

The mmtable2 package aims to make it easy to create tables by:

1. Using a ggplot2-style syntax for using a grammar of table operations.

2. Extends the amazing GT table package.

Read on for the process and a demonstration.

Comments closed

Plotting XGBoost Trees with R

Andrew Treadway shows off a method to visualize the results of training an XGBoost model:

In this post, we’re going to cover how to plot XGBoost trees in R. XGBoost is a very popular machine learning algorithm, which is frequently used in Kaggle competitions and has many practical use cases.

Let’s start by loading the packages we’ll need. Note that plotting XGBoost trees requires the DiagrammeR package to be installed, so even if you have xgboost installed already, you’ll need to make sure you have DiagrammeR also.

Click through for the process. H/T R-Bloggers.

Comments closed

Grafana Changing License

Alex Woodie has some bad news for us:

Grafana is switching licensing of its core products from Apache License 2.0 to the more restrictive Affero General Public License (GPL) v3. The company made the change in an attempt to balance the value of open source with Grafana’s monetization strategy, CEO Raj Dutt announced yesterday.

Grafana has been considering a license change for some time, Dutt wrote in a blog post on April 20. This week, the company finally felt the time was right to move.

“Oof” was my first response. I know that a pretty large percentage of companies won’t touch AGPL. I don’t know if we’ll see these companies adopt the commercial version of Grafana, see the companies switch over to something else, or see developers fork Grafana and come up with some other product. AGPL is not quite as scary for companies when a product is at the end of the chain, as visualization and dashboarding products tend to be, but for many companies, that doesn’t matter.

Comments closed