Category: Visualization

Plotting In R Using ggplot2

Published 2018-03-30 by Kevin Feasel

The folks at Sharp Sight Labs have another nice demo of ggplot2:

You’ve heard me say it a thousand times: to master data science, you need to practice.

You need to “practice small” by practicing individual techniques and functions. But you also need to “practice big” by working on larger projects.

To get some practice, my recommendation is to find reasonably sized datasets online and plot them.

Wikipedia is a nearly-endless source of good datasets. The great thing about Wikipedia is that many of the datasets are small and well contained. They are also fairly clean, with just enough messiness to make them a bit of a challenge.

As a quick example, this week, we’ll plot some economic data.

The code is deceptively easy considering the scope of the problem.

Comments closed

ggplot2 Geoms And Aesthetics

Published 2018-03-22 by Kevin Feasel

Tyler Rinker digs into ggplot2’s geoms and aesthetics:

I thought it my be fun to use the geoms aesthetics to see if we could cluster aesthetically similar geoms closer together. The heatmap below uses cosine similarity and heirarchical clustering to reorder the matrix that will allow for like geoms to be found closer to one another (note that today I learned from “R for Data Science” about the seriation package [https://cran.r-project.org/web/packages/seriation/index.html] that may make this matrix reordering task much easier).

It’s an interesting analysis of what’s available within ggplot2 and a detailed look at how different geoms fit together with respect to aesthetic options.

Comments closed

Defending Pie Charts

Published 2018-03-19 by Kevin Feasel

Bobby Johnson makes a valiant effort at defending the indefensible:

In the world of data analysis, there are few things more reviled than the pie chart. Among “serious” data people, it is at best trivial and naive, and at worst downright evil.

I do not agree with this. The pie chart is simple, but that is its beauty. It does exactly one thing and it does it well: it shows you how much different parts contribute to a whole. This isn’t the only question you ever have about your data, but when it’s the question you do have, the pie chart is perfect. That is not evil and it is not naive. It is data visualization doing what it should: taking something large and abstract and saying something simple about it that your brain can easily internalize.

I strongly disagree with arguments in the article, but do respect the attempt. In each of the cases, at least one of a bar chart, stacked 100% bar chart, or dot plot could give at least the same amount of information with less lower mental overhead.

Comments closed

Using Telegraf To Display SQL Server Metrics In Grafana

Published 2018-02-26 by Kevin Feasel

Tracy Boggiano has a writeup showing how to use Telegraf + InfluxDB + Grafana to view SQL Server metrics:

We have in the middle an open source time series database called InfluxDBis designed for collecting data that is timestamped such as performance metrics. Into that, we feed data from an open source project called Telegraf which can feed in more than just SQL Server statistics. And to be able to show us the data in nice pretty graphs that we can manipulate, drill-down on, and even set up alerts we display it using Grafana. Links to all of these products you find as we go through the setup of the solution.

Tracy’s post is dedicated to installation and configuration more than defining metrics, but it does get you on the road to custom metrics visualization.

Comments closed

Radar Charts With ggplot2

Published 2018-02-15 by Kevin Feasel

I have wrapped up my ggplot2 series, with the last post being on radar charts:

First, we need to install ggradar and load our relevant libraries. Then, I create a quick standardization function which divides our variable by the max value of that variable in the vector. It doesn’t handle niceties like divide by 0, but we won’t have any zero values in our data frames.

The radar_data data frame starts out simple: build up some stats by continent. Then I call the mutate_each_ function to call standardize for each variable in the vars set. mutate_each_is deprecated and I should use something different like mutate_at, but this does work in the current version of ggplot2 at least.

Finally, I call the ggradar() function. This function has a large number of parameters, but the only one you absolutely need is plot.data. I decided to change the sizes because by default, it doesn’t display well at all on Windows.

It was a lot of fun putting this series together. I think the most important part of the series was learning just how easy ggplot2 is once you sit down and think about it in a systemic manner.

Comments closed

Creating Modal Dialogues In Shiny

Published 2018-02-14 by Kevin Feasel

Dean Attali announces a new shiny package:

shinyalert uses the sweetalert JavaScript library to create simple and elegant modals in Shiny. Modals can contain text, images, OK/Cancel buttons, an input to get a response from the user, and many more customizable options. A modal can also have a timer to close automatically.

Simply call shinyalert() with the desired arguments, such as a title and text, and a modal will show up. In order to be able to call shinyalert() in a Shiny app, you must first call useShinyalert() anywhere in the app’s UI.

It does look nice. Check out Dean’s GitHub repo for more information. H/T R-Bloggers

Comments closed

Visualizing Cholesterol Data With ggplot2

Published 2018-02-14 by Kevin Feasel

Anisa Dhana uses the National Health and Nutrition Examination Survey and visualizes results with ggplot2:

From the plots above I find that regardless the different levels of diastolic and systolic blood pressure there is no substantial correlation between cholesterol and blood pressure. However, it is better to build the correlation line with geom_smooth or to calculate the Spearman correlation, although in this post we focus only on the visualization.

Lets build the correlation line.

Click through for several examples of visuals.

Comments closed

Using cowplot With ggplot2

Published 2018-02-14 by Kevin Feasel

I have a post on extending ggplot2’s functionality with cowplot:

Notice that I used geom_path(). This is a geom I did not cover earlier in the series. It’s not a common geom, though it does show up in charts like this where we want to display data for three variables. The geom_line() geom follows the basic rules for a line: that the variable on the y axis is a function of the variable on the x axis, which means that for each element of the domain, there is one and only one corresponding element of the range (and I have a middle school algebra teacher who would be very happy right now that I still remember the definition she drilled into our heads all those years ago).

But when you have two variables which change over time, there’s no guarantee that this will be the case, and that’s where geom_path() comes in. The geom_path() geom does not plot y based on sequential x values, but instead plots values according to a third variable. The trick is, though, that we don’t define this third variable—it’s implicit in the data set order. In our case, our data frame comes in ordered by year, but we could decide to order by, for example, life expectancy by setting data = arrange(global_avg, m_lifeExp). Note that in a scenario like these global numbers, geom_line() and geom_path() produce the same output because we’ve seen consistent improvements in both GDP per capita and life expectancy over the 55-year data set. So let’s look at a place where that’s not true.

The cowplot library gives you an easier way of linking together different plots of different sizes in a couple lines of code, which is much easier than using ggplot2 by itself.

Comments closed

Faceted ggplot2

Published 2018-02-09 by Kevin Feasel

I have another post in my ggplot2 series, this time covering facets:

Notice that we create a graph per continent by setting facets = ~continent. The tilde there is important—it’s a one-sided formula. You could also write c("continent") if that’s clearer to you.

I also set the number of columns, guaranteeing that we see no more than 3 columns of grids. I could alternatively set nrow, which would guarantee we see no more than a certain number of rows.

There are a couple other interesting features in facet_wrap. First, we can set scales = "free" if we want to draw each grid as if the others did not exist. By default, we use a scale of “fixed” to ensure that everything plots on the same scale. I prefer that for this exercise because it lets us more easily see those continental clusters.

Facets let you compare multiple graphs quickly. They’re great for fast comparison, but as I show in the post, you can distort the way the data looks by lining it up horizontally or vertically.

Comments closed

Themes And Legends In ggplot2

Published 2018-02-08 by Kevin Feasel

I have another part of my ggplot2 series up, this time on themes and legends:

You are not limited to using defaults in your graphs. Let’s go back to the minimal theme but change the fonts a bit. I want to make the following changes:

Use Gill Sans fonts instead of the default
Increase the title font size a little bit
Decrease the X axis font size a little bit
Remove the Y axis; the subtitle makes it clear what the Y axis contains

By the time we’re through this, we have publication-quality visuals in a few dozen lines of code. I also have provided a bonus rant on Windows and R and fonts because that’s a nasty experience.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31