Press "Enter" to skip to content

Category: Visualization

Themes And Legends In ggplot2

I have another part of my ggplot2 series up, this time on themes and legends:

You are not limited to using defaults in your graphs.  Let’s go back to the minimal theme but change the fonts a bit.  I want to make the following changes:

  1. Use Gill Sans fonts instead of the default

  2. Increase the title font size a little bit

  3. Decrease the X axis font size a little bit

  4. Remove the Y axis; the subtitle makes it clear what the Y axis contains

By the time we’re through this, we have publication-quality visuals in a few dozen lines of code.  I also have provided a bonus rant on Windows and R and fonts because that’s a nasty experience.

Comments closed

Labels And Annotations In ggplot2

I have another post in my ggplot2 series:

Annotations are useful for marking out important comments in your visual.  For example, going back to our wealth and longevity chart, there was a group of Asian countries with extremely high GDP but relatively low average life expectancy.  I’d like to call out that section of the visual and will use an annotation to do so.  To do this, I use the annotate() function.  In this case, I’m going to create a text annotation as well as a rectangle annotation so you can see exactly the points I mean.

By this point, we’re getting closer and closer to high-quality graphics.

Comments closed

ggplot2 Scales And Coordinates

I continue my series on ggplot2:

The other thing I want to cover today is coordinate systems.  The ggplot2 documentation shows seven coordinate functions.  There are good reasons to use each, but I’m only going to demonstrate one.  By default, we use the Cartesian coordinate system and ggplot2 sets the viewing space.  This viewing space covers the fullness of your data set and generally is reasonable, though you can change the viewing area using the xlim and ylim parameters.

The special coordinate system I want to point out is coord_flip, which flips the X and Y axes.  This allows us, for example, to turn a column chart into a bar chart.  Taking our life expectancy by continent, data I can create a bar chart whereas before, we’ve been looking at column charts.

There are a lot of pictures and more step-by-step work.  Most of these are still 3-4 lines of code, so again, pretty simple.

Comments closed

ggplot2 Mappings And Geoms

I continue my ggplot2 series:

We have used a new geom here, geom_smooth.  The geom_smooth function creates a smoothed conditional mean.  Basically, we’re drawing some line as a result of a function based on this input data.  Notice that there are two parameters that I set:  method and se.  The method parameter tells the function which method to use.  There are five methods available, including using a Generalized Additive Model (gam), Locally Weighted Scatterplot Smoothing (loess), and three varieties of Linear Models (lm, glm, and rlm).  The se parameter controls whether we want to see the standard error bar.

I don’t cover all of the mapping options and all of the geoms, but I think it’s enough to get a grip on the concept.

Comments closed

The Grammar of Graphics

I’ve started a new series:

Instead, we will start with Wickham’s paper on ggplot2.  This gives us the basic motivation behind the grammar of graphics by covering what a grammar does for us:  “A grammar provides a strong foundation for understanding a diverse range of graphics. A grammar may also help guide us on what a well-formed or correct graphic looks like, but there will still be many grammatically correct but nonsensical graphics. This is easy to see by analogy to the English language: good grammar is just the first step in creating a good sentence” (3).

With a language, we have different language components like nouns (which can be subjects, direct objects, or indirect objects), verbs, adjectives, adverbs, etc.  We put together combinations of those individual components to form complete sentences and transmit ideas.  Our particular word choice and language component usage will affect the likelihood of success in idea transmission, but to an extent, we can work iteratively on a sentence, switching words or adding phrases to get the point across the way we desire.

With graphics, we can do the same thing.  Instead of thinking of “a graph” as something which exists in and of itself, we should think of different objects that we combine into its final product:  a graph.

I call this first post the poor man’s literature review.  The rest of the series is code- and visual-heavy.

Comments closed

Visualizing Progress Using Power BI

Stacia Varga methods of visualizing progress toward a goal using Power BI:

Another interesting way to look at goal tracking for a goal in which time is an important element, such as my daily Move goal, is to use a KPI visualization.

Just as many businesses use KPIs, which is an abbreviation for key performance indicators, I can use a KPI to see my current metric value, as of the last date for which I have collected data. In Power BI, not only can I see this value, but I can also see how it compares to the target at a glance, through the use of color. Red is bad and green is good, by default, but I can use the formatting options to change this. And I can see how the value trends over time, much like my current line and clustered column chart does.

Click through for several techniques.

Comments closed

“Pretty But Useless” Visuals

I continue my dashboard visualization series with a bit of an extended rant:

The best use of a pie chart is to show a simple share of a static total.  Here, we can see that Daredevil has almost half of the critics’ reviews, and that The Punisher and Jessica Jones are split.

This simple pie chart also shows some of the problems of pie charts.  The biggest issue is that people have trouble with angle, making it hard to distinguish relative slices.  For example, is Jessica Jones’s slice larger or is The Punisher’s?  It’s really hard to tell in this case, and if that difference is significant, you’re making life harder for your viewers.

Second, as slice percentages get smaller, it becomes harder to differentiate slices.  In this case, we can see all three pretty clearly, but if we start getting 1% or 2% slices, they end up as slivers on the pie, making it hard to distinguish one slice from another.

Third, pie charts usually require one color per slice.  This can lead to an explosion of color usage.  Aside from potential risks of using colors which in concert are not CVD-friendly, adding all of these colors has yet another unintended consequence.  If you use the same color in two different pie charts to mean different things, you can confuse people, as they will associate color with some category, and so if they see the same color twice, they will implicitly assign both things the same category.  That leads to confusion.  Yes, careful reading of your legend dissuades people of that notion, but by the time they see the legend, they’ve already implicitly mapped out what this color represents.

Fourth, pie charts often require legends, which increases eye scanning.

Click through to read me complain about other types of visuals, too.

Comments closed

Visuals I Like

I continue my series on dashboard visualization:

This leads me to a little bit of advice for choosing bars versus columns.  You will want to choose a bar chart if the following are true:

  1. Category names are long, where by “long” I mean more than 2-3 characters.
  2. You have a lot of categories.
  3. You have relatively few periods—ideally, you’ll only have one period with a bar chart.

By contrast, you would choose a column chart if:

  1. Viewing across periods is important.  For example, I want to see the number of critic reviews fluctuate across the season for each of the TV shows.
  2. You have many periods with relatively few categories.  The more periods and the fewer categories, the more likely you are to want a column chart.
  3. Category names are short, by which I mean approximately 1-3 characters.

Some people will rotate text 90 degrees to try to turn a bar chart into a column chart.  I don’t like that because then people need to rotate the page or crane their necks.  In that case, just use the bar chart.

I like Cleveland dot plots, but they’re not implemented at all in Power BI and the two add-ons in the store aren’t that great either.  Also, there’s bonus material explaining why The Punisher season 1 was better than Daredevil season 1.

Comments closed

Visual Principles And Dashboards

I continue my series on dashboard visualization by looking at pictures:

In a bit more detail, you can make a dashboard glanceable by following these guidelines:

  1. Ensure that there is clear purpose in your metric design and display.  In other words, think about which metrics you want to show, how you want to show them, and where you put metrics in relation to one another.

  2. Group metrics by function into sections.  Look at the dashboard above.  It has four clusters of metrics:  those around revenue, new customers, revenue per customer, and customer acquisition cost.  All of the revenue metrics are clustered in the top-left quadrant of the dashboard.  Furthermore, all revenue-related metrics (that is, revenue metrics and revenue per customer metrics) are on the left-hand side of the dashboard, so the CEO can focus on that half and learn about revenue and revenue per customer.  She doesn’t need to look in the top-left corner for one revenue measure and in the bottom right for another; she can focus down to a portion of the dashboard and get an answer.

  3. It should be easy to see and differentiate those clusters of metrics.  Our natural instinct might be to put borders around the clusters, but whitespace is your friend—remember, less is more.  If you add a bit more whitespace between clusters of measures, you’ll make it easy for people to see that there’s a difference without distracting them with unnecessary lines.

I cover the Rule of Thirds, Glanceability, and Color Vision Deficiency, three important considerations for a designer.

Comments closed