Press "Enter" to skip to content

Category: Visualization

Watch Those Aggregates

Ronald Dameron warns us about graphs in the Azure portal:

I would expect to see the CPU spike with the same value no matter what time range I selected. But, to see the spike that fired the alert, I had to to “Edit” the chart and select different time ranges to see the differences. It wasn’t until I selected a narrow custom time range that the CPU graph would display the CPU spike that corresponded to the alert firing. The alert fires if the CPU percentage exceeds 80% over 15 minutes. So, if you “know” something happened, try different time ranges but especially the custom range to find what you are looking for.

For your highbrow reference of the quarter, FA Hayek on John Maynard Keynes’s Treatise of Money:  “Mr. Keynes’s aggregates conceal the most fundamental mechanisms of change.”

This isn’t an Azure-specific problem; it’s something we have to think about whenever we aggregate data.

Comments closed

Play Axis Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Play Axis Power BI Custom Visual.  The Play Axis visual works like a dynamic slicer that animates your other report visuals without needing to click every time you want to change your filter value.

This is a valuable custom visual when dealing with time series data, but as Devin shows, you can iterate through other sets, like a set of employee names.

Comments closed

Subplots In Maps

Ilya Kashnitsky shows how to embed subplots within a map using ggplot2:

So, with this map I want to show the location of more and less urbanized NUTS-2 regions of Europe. But I also want to show – with subplots – how I defined the three subregions of Europe (Eastern, Southern, and Western) and what is the relative frequency of the three categories of regions (Predominantly Rural, Intermediate, and Predominantly Rural) within each of the subregions. The logic of actions is simple: first prepare all the components, then assemble them in a composite plot. Let’s go!

This is very useful information, well worth the read.

Comments closed

Spark Changes In HDP 2.6

Vinay Shukla and Syed Mahmood talk about what’s new with Spark and Zeppelin in the Hortonworks Data Platform 2.6 update:

SPARKR & PYSPARK

Most data scientists use R & Python and with SparkR & PySpark respectively they can continue to leverage their familiarity with the R & Python languages. However, they need to use the Spark API to leverage Machine learning with Spark and to take advantage of distributed computations. Both SparkR & PySpark are evolving rapidly and SparkR now supports a number of machine learning algorithms such as LDA, ALS, RF, GMM GBT etc. Another key improvement in SparkR is the ability to deploy a package interactively. This will help Data Scientists deploy their favorite R package in their own environment without stepping on other users.

PySpark now also supports deploying VirtualEnv and this will allow PySpark users to deploy their libraries in their own individual deployments.

There are several large changes, so check it out.

Comments closed

Custom ggplot2 Subplots

Ilya Kashnitsky shows how to create custom subplots using ggplot2:

Actually, ggplot2 is a very powerful and flexible tool that allows to draw figures with quite a complex layout. Today I want to show the code that aligns six square plots (actually, maps) just as in the figure above. And it’s all about the handy function ggplot2::annotation_custom(). Since I used the layout more than once, I wrapped the code that produced it into a function that takes a list of 6 square plots as an input and yields the arranged figure with arrows as an output. Here is the commented code of the function.

This is the difference between “I’m just going to throw some stuff on there” (which is how I tend to operate) versus well thought out visual layout.

Comments closed

Building A Concatenated Tooltip In Power BI

Devin Knight has started a new series, walking through problems his clients have faced implementing Power BI solutions.  In this edition, Devin wants to build a comma-delimited list to display on a tooltip:

This works perfectly for Stock because it automatically summarizes the value but, you’ll notice above that the tooltip for Subcategory has an interesting behavior. Rather than displaying the list of the values in Subcategory it actually just show the very first value. This happens because the Tooltip field requires that any column used in it be able to aggregate or roll up the values into what’s shown on the chart. Since Subcategory is just a text field Power BI automatically applies the FIRST function to return back the first value that appears. You could optionally change this from FIRST to either LAST, COUNT, or COUNTDISTINCT.

So the real problem I want to solve here is rather than only showing the first subcategory how do I list all the subcategories in a comma separated list in the tooltip? Let’s walk through a couple possible designs to this solution.

Read on for two different designs, including the code to implement the solutions.

Comments closed

Building A Spinning Globe With R

James Cheshire shows how to use R to create an image of a spinning globe:

It has been a long held dream of mine to create a spinning globe using nothing but R (I wish I was joking, but I’m not). Thanks to the brilliant mapmate package created by Matt Leonawicz and shed loads of computing power, today that dream became a reality. The globe below took 19 hours and 30 processors to produce from a relatively low resolution NASA black marble data, and so I accept R is not the best software to be using for this – but it’s amazing that you can do this in R at all!

Now all that is missing is a giant TV and an evil lair.

Comments closed

Pretty R Plots

Simon Jackson has a couple posts on how to use ggplot2 to make graphs prettier.  First, histograms:

Time to jazz it up with colour! The method I’ll present was motivated by my answer to this StackOverflow question.

We can add colour by exploiting the way that ggplot2 stacks colour for different groups. Specifically, we fill the bars with the same variable (x) but cut into multiple categories:

Then he follows up with scatter plots:

Shape and size

There are many ways to tweak the shape and size of the points. Here’s the combination I settled on for this post:

There are some nice tricks here around transparency, color scheme, and gradients, making it a great series.  As a quick note, this color scheme in the histogram headliner photo does not work at all for people with red-green color-blindness.  Using a URL color filter like Toptal’s is quite helpful in discovering these sorts of issues.

Comments closed

Narrative Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Narrative Power BI Custom Visual.  The Narrative visual is developed by Narrative Science and it gives you the ability to automatically deliver analysis of your data.  The results look similar to a final report that a analyst might provide after spending weeks with your data.

It’s not going to replace a seasoned analyst (or any analyst at all, frankly) but if you need a couple paragraphs of text summing up a trend, it’s a good start.

Comments closed