Press "Enter" to skip to content

Category: Visualization

Highlighting Data With gghighlight

Laura Ellis shows off the gghighlight package, which allows you to highlight selectively certain sets of data in ggplot:

While the above methodology is quite easy, it can be a bit of a pain at times to create and add the new data frame.  Further, you have to tinker more with the labelling to really call out the highlighted data points.

Thanks to Hiroaki Yutani, we now have the gghighlight package which does most of the work for us with a small function call!!   Please note that a lot of this code was created by looking at examples on her introduction document.

The new school way is even simplier:

  1. Using ggplot2, create a plot with your full data set.

  2. Add the gghighlight() function to your plot with the conditions set to identify your subset.

  3. Celebrate! This was one less step AND we got labels!

That’s a very cool package.  H/T R-Bloggers

Comments closed

Layout Images In Power BI

Meagan Longoria has some tips for using layout images in Power BI:

Using layout images in Power BI has become a popular design trend. When I say layout images, I’m referring to background images with shapes around areas where visuals are placed. This is different from the new wallpaper feature that became available in the July release, which can be used to format the grey area outside your report page and extend the main color of background images.

Layout images can help with spacing and alignment within a report and can help create consistency across reports. They can also help create affordances, using consistent layout and design to make it obvious how users should interact with our reports.

I use layout images in some of my reports, but I don’t think they are necessary on every report. There are a couple of things to consider when using layout images.

Read on for an example of a good layout image versus a bad layout image as well as tips and tricks on how to create good layout images.

Comments closed

Visualizing Linear Regression Results

Bernardo Lares gives us a few ways of interpreting visually a linear regression result in R:

The most obvious plot to study for a linear regression model, you guessed it, is the regression itself. If we plot the predicted values vs the real values we can see how close they are to our reference line of 45° (intercept = 0, slope = 1). If we’d had a very sparse plot where we can see no clear tendency over that line, then we have a bad regression. On the other hand, if we have all our points over the line, I bet you gave the model your wished results!

Then, the Adjusted R2 on the plot gives us an easy parameter for us to compare models and how well did it fits our reference line. The nearer this value gets to 1, the better. Without getting too technical, if you add more and more useless variables to a model, this value will decrease; but, if you add useful variables, the Adjusted R-Squared will improve.

We also get the RMSE and MAE (Root-Mean Squared Error and Mean Absolute Error) for our regression’s results. MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. On the other side we have RMSE, which is a quadratic scoring rule that also measures the average magnitude of the error. It’s the square root of the average of squared differences between prediction and actual observation. Both metrics can range from 0 to ∞ and are indifferent to the direction of errors. They are negatively-oriented scores, which means lower values are better.

I like this approach to explaining models.

Comments closed

Scatterplots For Multivariate Analysis

Neil Saunders declutters a complicated visual with a simple scatterplot:

Sydney’s congestion at ‘tipping point’ blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes.

Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, but have comparable average commute times to higher-density cities. But if you’re plotting commute time versus population density…doesn’t a different kind of chart come to mind first? y versus x. C’mon.

Let’s explore.

Simple is typically better, and that adage holds here.

Comments closed

Building Cone Plots In Plotly

The Plotly blog shows how to use Python to build 3D cone plots using Plotly:

This plot uses an explicitly defined vector field. A vector field refers to an assignment of a vector to each point in a subset of space.

In this plot, we visualize a collection of arrows that simply model the wind speed and direction at various levels of the atmosphere.

3-D weather plots can be useful to research scientists to gain a better understanding of the atmospheric profile, such as during the prediction of severe weather events like tornadoes and hurricanes.

Sometimes a 3D plot is the best answer.  When it is, this looks like a good solution.  H/T R-bloggers

Comments closed

Sorting When Your Measure Is Not In The Visual

Kasper de Jonge shows us different ways of sorting a visual by some unrelated measure:

So lets start with the simple one, I want to sort a chart on a measure not part of the visual. Let’s take this visual:

Now instead of sorting by OrderQuantity I want to sort by the ListPrice. The trick here is to make the measure part of the query, and one way you can do that is by adding it to the tooltip

Read on for examples for charts as well as matrices.

Comments closed

Building A Gantt Chart With ggplot2

Sebastian Sauer shows us how to build a gantt chart in R:

Of importance are only TaskPrevious_Evnet and Duration. In addition, we need an overall start date (“2019-03-01” in this case). Each subsequent task is assumed to follow neatly its predecessing event.

Our job is to compute the start date and end date of task given that we know the initial start date and the durations. As said, this procedure is based on the assumption that there is a frictionless and gapless sequence of tasks.

Read on for a code-heavy example.  I’ve always had a soft spot in my heart for gantt charts.

Comments closed

Graphics In R

David Smith is following the kerfuffle that Edward Tufte unleashed on Twitter recently:

While graphics guru Edward Tufte recently claimed that “R coders and users just can’t do words on graphics and typography” and need additonal tools to make graphics that aren’t “clunky”, data journalists at major publications beg to differ. The BBC has been creating graphics “purely in R” for some time, with a typography style matching that of the BBC website. Senior BBC Data Journalist Christine Jeavans offers several examples, including this chart of life expectancy differences between men and women:

I think Tufte’s off base here.

Comments closed

Scatterplot Matrices

The Plotly folks show off scatterplot matrices in Python:

The scatterplot matrix, known acronymically as SPLOM, is a relatively uncommon graphical tool that uses multiple scatterplots to determine the correlation (if any) between a series of variables.

These scatterplots are then organized into a matrix, making it easy to look at all the potential correlations in one place.

SPLOMs, invented by John Hartigan in 1975, allow data aficionados to quickly realize any interesting correlations between parameters in the data set.

In this post, we’ll go over how to make SPLOMs in Plotly with Python. For extra insights, check out our SPLOM tutorial in Python and R.

fff

Comments closed