Press "Enter" to skip to content

Category: Visualization

A Layered Grammar Of Graphics

Hadley Wickham describes some of the decisions he made when putting together ggplot2:

In the examples above, we have seen some of the components that make up a plot:
• data and aesthetic mappings,
• geometric objects,
• scales, and
• facet specification.
We have also touched on two other components:
• statistical transformations, and
• the coordinate system.
Together, the data, mappings, statistical transformation, and geometric object form a layer. A plot may have multiple layers, for example, when we overlay a scatterplot with a smoothed line.

This isn’t an article about how to use ggplot2; rather, it’s an article about implementation decisions.  To that end, I think it’s useful to see some of the logic behind ggplot2’s decisions.

Comments closed

The Grammar Of Graphics

Leland Wilkinson has written the book on how we should write systems which visualize data:

This book was written for statisticians, computer scientists, geographers, research and applied scientists, and others interested in visualizing data. It presents a unique foundation for producing almost every quantitative graphic found in scientific journals, newspapers, statistical packages, and data visualization systems. This foundation was designed for a distributed computing environment (Internet, Intranet, client-server), with special attention given to conserving computer code and system resources.

There’s no free copy of this book, and it’s a very expensive textbook. For most people, you’ll get more from derivative works, but if you’ve thought about putting together a graphics library, this is a must-read.

Comments closed

Data Visualization For Social Science

I’ve started reading Kieran Healy’s book, Data Visualization For Social Science.  He has a free draft available online, and it automatically builds nightly so you’re seeing the latest version.  From the preface:

This book is a hands-on introduction to the principles and practice of looking at and presenting data using R and ggplot. R is a powerful, widely used, and freely available programming language for data analysis. You may be interested in exploring ggplot after having used R before, or be entirely new to both R and ggplot and just want to graph your data. I do not assume you have any prior knowledge of R.

After installing the software we need, we begin with an overview of some basic principles of visualization. We focus not just on the aesthetic aspects of good plots, but on how their effectiveness is rooted in the way we perceive properties like length, absolute and relative size, orientation, shape, and color. We then learn how to produce and refine plots using ggplot2, a powerful, versatile, and widely-used visualization library for R (Wickham 2016a). The ggplot2 library implements a “grammar of graphics” (Wilkinson 2005). This approach gives us a coherent way to produce visualizations by expressing relationships between the attributes of data and their graphical representation.

Through a series of worked examples, you will learn how to build plots piece by piece, beginning with scatterplots and summaries of single variables, then moving on to more complex graphics. Topics covered include plotting continuous and categorical variables, layering information on graphics; faceting grouped data to produce effective “small multiple” plots; transforming data to easily produce visual summaries on the graph such as trend lines, linear fits, error ranges, and boxplots; creating maps, and also some alternatives to maps worth considering when presenting country- or state-level data. We will also cover cases where we are not working directly with a dataset, but rather with estimates from a statistical model. From there, we will explore the process of refining plots to accomplish common tasks such as highlighting key features of the data, labeling particular items of interest, annotating plots, and changing their overall appearance. Finally we will examine some strategies for presenting graphical results in different formats, and to different sorts of audiences.

I’m less than halfway through the book so far, but it is quite an approachable look at the ggplot2 library with a bit of discussion on what makes for quality graphics.

Comments closed

Animated Dot Plots In R

John MacKintosh shows how to create dot plots in ggplot2, and then he uses gganimate to turn it into an animated plot:

If you want to follow along (go on) then you should head over to Neil’s site, download the excel file and take a look at the “how to” guide on the same page. Existing R users are already likely to be shuddering at all the manual manipulation required.

For the first attempt, I followed Neil’s approach pretty closely, resulting in a lot of code to sort and group, although ggplot2 made the actual plotting much simpler. I shared my very first attempt ( produce with barely any ggplot2 code) which was quite good, but there were a few issues – the ins/ outs being coloured blue instead of grey, and overplotting of several points.

Click through for code and explanation.  H/T R-Bloggers

Comments closed

Perceiving Forms: Gestalt Principles

Meagan Longoria describes the basics of the Gestalt school of thought:

Gestalt principles can help us highlight patterns and reduce noise in data visualization. They can also help us create a visual hierarchy and employ symmetry in our designs for a more pleasing user experience.

Some Gestalt principles are very similar to our understanding of preattentive attributes. This is demonstrated in the set of 3 charts below.

Meagan gives a good overview of the concepts and uses a Power BI example to apply them.  There’s some really good advice in here.

Comments closed

Keeping Report Decks Consistent

Tristan Robinson has tips for keeping your Power BI Enterprise report decks looking consistent and nice:

The next consideration is around the number of objects on a report – keep it simple.  Avoid building a giant monolithic report, the more objects you use, the slower the report will perform on PBI service, iPad’s and even to develop.  This is especially true for tables/matrices which will each need to fire off separate DAX queries to return the data elements. Too many objects also has knock on effects for exporting to PowerPoint as objects will overlap with one another more which may not be as much of a case within PBI service but will affect other apps. You can use the selection pane (in the view tab) so move objects above/below one another which will bring forward/push back the elements.

This is advice tailored toward Power BI in particular, but much of it also applies in general.

Comments closed

Network And Sankey Diagrams In Python And R

Tony Hirst has a roundup of various R and Python packages which build network charts or Sankey diagrams:

Another way we might be able to look at the data “out of time” to show flow between modules is to use a Sankey diagram that allows for the possibility of feedback loops.

The Python sankeyview package (described in Hybrid Sankey diagrams: Visual analysis of multidimensional data for understanding resource use looks like it could be useful here, if I can work out how to do the set-up correctly!

Sankey diagrams are on my list of dangerous visuals:  done right, they are informative, but it’s easy to try to put too much into the diagram and thereby confuse everybody.

Comments closed

“Caveman” Graphs In SQL

Denis Gobo puts together some basic Management Studio data visualization:

I found this technique on Rich Benner’s SQL Server Blog: Visualising the Marvel Cinematic Universe in T-SQL and decided to play around with it after someone asked me to give him the sizes of all databases on a development instance of SQL Server

The way it works is that you take the size of the database and then divide that number against the total size of all databases. You then use the replicate function with the | (pipe) character to generate the ‘graph’  so 8% will look like this ||||||||

You can use this for tables with most rows, a count per state etc etc. By looking at the output the graph column adds a nice visual effect to it IMHO

It does the job and doesn’t require you to go out to a different product, so it works pretty well for occasional administrative queries.

Comments closed