Press "Enter" to skip to content

Category: Visualization

Examples Of Charts In Different Languages

David Smith points out a great repository of information on generating different types of charts in different libraries:

The visualization tools include applications like Excel, Power BI and Tableau; languages and libraries including R, Stata, and Python’s matplotlib); and frameworks like D3. The data visualizations range from the standard to the esoteric, and follow the taxonomy of the book Data Visualisation (also by Andy Kirk). The chart categories are color coded by row: categorical (including bar charts, dot plots); hierarchical (donut charts, treemaps); relational (scatterplots, sankey diagrams); temporal (line charts, stream graphs) and spatial (choropleths, cartograms).

Check out the Chartmaker Directory.

Comments closed

Styling In ggplot2

The folks at Jumping Rivers show an example of creating a nice-looking plot with ggplot2:

The changes we’ve made so far would impossible for any package to do for us – how would the package know the plot title? We can now improve the look and feel of the plot. There are two ways of complementary ways of doing this: scales and themes. The ggplot scales control things like colours and point size. In the latest version of ggplot2, version 3.0.0, the Viridis colour palette was introduced. This palette is particularly useful for creating colour-blind friendly palettes

g + scale_colour_viridis_d() # d for discrete

With a few lines of code, those default graphs can look a lot nicer.

Comments closed

NFL Player Stats In Power BI

Dustin Ryan shares his NFL player stats and analysis Power BI desktop file:

I’ve had a lot of people ask me for this over the past few months and its finally (mostly) ready! There are still a few things I’d like to do with the data models and reports but I wanted to go ahead and get the content shared out since I know many people use this for the Fantasy Football drafts which generally happen during the third week of the NFL preseason.

So here it is. I’ve spent a decent amount of time scraping the data from a few different websites in order to put something together I thought would be useful and fun, so please take a look and enjoy it!

Click through for the file and a YouTube video with more info.

Comments closed

Creating Timelines With dbatools

Marcin Gminski shows how to pull SQL Agent and backup history out of SQL Server and display it as a visual history timeline:

Currently, the output from the following commands is supported:

  • Get-DbaAgentJobHistory
  • Get-DbaBackupHistory

You will run the above commands as you would normally do but pipe the output to ConvertTo-DbaTimeline, the same way as you would with any other ConverTo-* PowerShell function. The output is a string that most of the time you will save as file using the Out-File command in order to open it in a browser.

Then, with the ConvertTo-DbaTimeline cmdlet, you can convert that into an HTML page which looks pretty good.

Comments closed

Making A Readable Presentation Template

Meagan Longoria has some advice for presentation templates:

The title text is 36pt Segoe UI Light, the subheading text is 24pt Segoe UI, and the speaker info text is 14 pt Segoe UI.

Those font sizes alone make it very hard to read from the back of even the smaller rooms at the conference.

In addition to being too small, the gray text for the speaker info doesn’t have enough contrast from the white background. We want to get a contrast ratio of at least 4.5:1 (but 7:1 would be better). The contrast ratio for these colors is 4.0.

While sans serif fonts are generally thought to be easier to read in presentations, it’s better to use fonts with a stroke width that is not too thin – not necessarily wider characters, but thicker lines that make up each letter. So Segoe UI Light would not be my first choice for a title font, but Segoe UI or Segoe UI Bold might be ok.

Also, the red used on the right half of the slide is VERY bright for an element that is purely decorative, to the point that it might be distracting for some people. And the reason we need to squish our title into two lines of too-small text is because that giant red shape takes up half the page. What is more important: a “pretty” red shape to make our slide look snazzy or being able to clearly read the title of the presentation?

There’s a lot along these lines, and it’s great food for thought.  Meagan includes a set of recommendations at the end, so be sure to catch those.

Comments closed

Highlighting Data With gghighlight

Laura Ellis shows off the gghighlight package, which allows you to highlight selectively certain sets of data in ggplot:

While the above methodology is quite easy, it can be a bit of a pain at times to create and add the new data frame.  Further, you have to tinker more with the labelling to really call out the highlighted data points.

Thanks to Hiroaki Yutani, we now have the gghighlight package which does most of the work for us with a small function call!!   Please note that a lot of this code was created by looking at examples on her introduction document.

The new school way is even simplier:

  1. Using ggplot2, create a plot with your full data set.

  2. Add the gghighlight() function to your plot with the conditions set to identify your subset.

  3. Celebrate! This was one less step AND we got labels!

That’s a very cool package.  H/T R-Bloggers

Comments closed

Layout Images In Power BI

Meagan Longoria has some tips for using layout images in Power BI:

Using layout images in Power BI has become a popular design trend. When I say layout images, I’m referring to background images with shapes around areas where visuals are placed. This is different from the new wallpaper feature that became available in the July release, which can be used to format the grey area outside your report page and extend the main color of background images.

Layout images can help with spacing and alignment within a report and can help create consistency across reports. They can also help create affordances, using consistent layout and design to make it obvious how users should interact with our reports.

I use layout images in some of my reports, but I don’t think they are necessary on every report. There are a couple of things to consider when using layout images.

Read on for an example of a good layout image versus a bad layout image as well as tips and tricks on how to create good layout images.

Comments closed

Visualizing Linear Regression Results

Bernardo Lares gives us a few ways of interpreting visually a linear regression result in R:

The most obvious plot to study for a linear regression model, you guessed it, is the regression itself. If we plot the predicted values vs the real values we can see how close they are to our reference line of 45° (intercept = 0, slope = 1). If we’d had a very sparse plot where we can see no clear tendency over that line, then we have a bad regression. On the other hand, if we have all our points over the line, I bet you gave the model your wished results!

Then, the Adjusted R2 on the plot gives us an easy parameter for us to compare models and how well did it fits our reference line. The nearer this value gets to 1, the better. Without getting too technical, if you add more and more useless variables to a model, this value will decrease; but, if you add useful variables, the Adjusted R-Squared will improve.

We also get the RMSE and MAE (Root-Mean Squared Error and Mean Absolute Error) for our regression’s results. MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. On the other side we have RMSE, which is a quadratic scoring rule that also measures the average magnitude of the error. It’s the square root of the average of squared differences between prediction and actual observation. Both metrics can range from 0 to ∞ and are indifferent to the direction of errors. They are negatively-oriented scores, which means lower values are better.

I like this approach to explaining models.

Comments closed

Scatterplots For Multivariate Analysis

Neil Saunders declutters a complicated visual with a simple scatterplot:

Sydney’s congestion at ‘tipping point’ blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes.

Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, but have comparable average commute times to higher-density cities. But if you’re plotting commute time versus population density…doesn’t a different kind of chart come to mind first? y versus x. C’mon.

Let’s explore.

Simple is typically better, and that adage holds here.

Comments closed