Press "Enter" to skip to content

Category: Visualization

Visualizing Cholesterol Data With ggplot2

Anisa Dhana uses the National Health and Nutrition Examination Survey and visualizes results with ggplot2:

From the plots above I find that regardless the different levels of diastolic and systolic blood pressure there is no substantial correlation between cholesterol and blood pressure. However, it is better to build the correlation line with geom_smooth or to calculate the Spearman correlation, although in this post we focus only on the visualization.

Lets build the correlation line.

Click through for several examples of visuals.

Comments closed

Using cowplot With ggplot2

I have a post on extending ggplot2’s functionality with cowplot:

Notice that I used geom_path().  This is a geom I did not cover earlier in the series.  It’s not a common geom, though it does show up in charts like this where we want to display data for three variables.  The geom_line() geom follows the basic rules for a line:  that the variable on the y axis is a function of the variable on the x axis, which means that for each element of the domain, there is one and only one corresponding element of the range (and I have a middle school algebra teacher who would be very happy right now that I still remember the definition she drilled into our heads all those years ago).

But when you have two variables which change over time, there’s no guarantee that this will be the case, and that’s where geom_path() comes in.  The geom_path() geom does not plot y based on sequential x values, but instead plots values according to a third variable.  The trick is, though, that we don’t define this third variable—it’s implicit in the data set order.  In our case, our data frame comes in ordered by year, but we could decide to order by, for example, life expectancy by setting data = arrange(global_avg, m_lifeExp).  Note that in a scenario like these global numbers, geom_line() and geom_path() produce the same output because we’ve seen consistent improvements in both GDP per capita and life expectancy over the 55-year data set.  So let’s look at a place where that’s not true.

The cowplot library gives you an easier way of linking together different plots of different sizes in a couple lines of code, which is much easier than using ggplot2 by itself.

Comments closed

Faceted ggplot2

I have another post in my ggplot2 series, this time covering facets:

Notice that we create a graph per continent by setting facets = ~continent.  The tilde there is important—it’s a one-sided formula.  You could also write c("continent") if that’s clearer to you.

I also set the number of columns, guaranteeing that we see no more than 3 columns of grids. I could alternatively set nrow, which would guarantee we see no more than a certain number of rows.

There are a couple other interesting features in facet_wrap. First, we can set scales = "free" if we want to draw each grid as if the others did not exist. By default, we use a scale of “fixed” to ensure that everything plots on the same scale. I prefer that for this exercise because it lets us more easily see those continental clusters.

Facets let you compare multiple graphs quickly.  They’re great for fast comparison, but as I show in the post, you can distort the way the data looks by lining it up horizontally or vertically.

Comments closed

Themes And Legends In ggplot2

I have another part of my ggplot2 series up, this time on themes and legends:

You are not limited to using defaults in your graphs.  Let’s go back to the minimal theme but change the fonts a bit.  I want to make the following changes:

  1. Use Gill Sans fonts instead of the default

  2. Increase the title font size a little bit

  3. Decrease the X axis font size a little bit

  4. Remove the Y axis; the subtitle makes it clear what the Y axis contains

By the time we’re through this, we have publication-quality visuals in a few dozen lines of code.  I also have provided a bonus rant on Windows and R and fonts because that’s a nasty experience.

Comments closed

Labels And Annotations In ggplot2

I have another post in my ggplot2 series:

Annotations are useful for marking out important comments in your visual.  For example, going back to our wealth and longevity chart, there was a group of Asian countries with extremely high GDP but relatively low average life expectancy.  I’d like to call out that section of the visual and will use an annotation to do so.  To do this, I use the annotate() function.  In this case, I’m going to create a text annotation as well as a rectangle annotation so you can see exactly the points I mean.

By this point, we’re getting closer and closer to high-quality graphics.

Comments closed

ggplot2 Scales And Coordinates

I continue my series on ggplot2:

The other thing I want to cover today is coordinate systems.  The ggplot2 documentation shows seven coordinate functions.  There are good reasons to use each, but I’m only going to demonstrate one.  By default, we use the Cartesian coordinate system and ggplot2 sets the viewing space.  This viewing space covers the fullness of your data set and generally is reasonable, though you can change the viewing area using the xlim and ylim parameters.

The special coordinate system I want to point out is coord_flip, which flips the X and Y axes.  This allows us, for example, to turn a column chart into a bar chart.  Taking our life expectancy by continent, data I can create a bar chart whereas before, we’ve been looking at column charts.

There are a lot of pictures and more step-by-step work.  Most of these are still 3-4 lines of code, so again, pretty simple.

Comments closed

ggplot2 Mappings And Geoms

I continue my ggplot2 series:

We have used a new geom here, geom_smooth.  The geom_smooth function creates a smoothed conditional mean.  Basically, we’re drawing some line as a result of a function based on this input data.  Notice that there are two parameters that I set:  method and se.  The method parameter tells the function which method to use.  There are five methods available, including using a Generalized Additive Model (gam), Locally Weighted Scatterplot Smoothing (loess), and three varieties of Linear Models (lm, glm, and rlm).  The se parameter controls whether we want to see the standard error bar.

I don’t cover all of the mapping options and all of the geoms, but I think it’s enough to get a grip on the concept.

Comments closed

The Grammar of Graphics

I’ve started a new series:

Instead, we will start with Wickham’s paper on ggplot2.  This gives us the basic motivation behind the grammar of graphics by covering what a grammar does for us:  “A grammar provides a strong foundation for understanding a diverse range of graphics. A grammar may also help guide us on what a well-formed or correct graphic looks like, but there will still be many grammatically correct but nonsensical graphics. This is easy to see by analogy to the English language: good grammar is just the first step in creating a good sentence” (3).

With a language, we have different language components like nouns (which can be subjects, direct objects, or indirect objects), verbs, adjectives, adverbs, etc.  We put together combinations of those individual components to form complete sentences and transmit ideas.  Our particular word choice and language component usage will affect the likelihood of success in idea transmission, but to an extent, we can work iteratively on a sentence, switching words or adding phrases to get the point across the way we desire.

With graphics, we can do the same thing.  Instead of thinking of “a graph” as something which exists in and of itself, we should think of different objects that we combine into its final product:  a graph.

I call this first post the poor man’s literature review.  The rest of the series is code- and visual-heavy.

Comments closed

Visualizing Progress Using Power BI

Stacia Varga methods of visualizing progress toward a goal using Power BI:

Another interesting way to look at goal tracking for a goal in which time is an important element, such as my daily Move goal, is to use a KPI visualization.

Just as many businesses use KPIs, which is an abbreviation for key performance indicators, I can use a KPI to see my current metric value, as of the last date for which I have collected data. In Power BI, not only can I see this value, but I can also see how it compares to the target at a glance, through the use of color. Red is bad and green is good, by default, but I can use the formatting options to change this. And I can see how the value trends over time, much like my current line and clustered column chart does.

Click through for several techniques.

Comments closed