Press "Enter" to skip to content

Category: Visualization

Data Visualization Basics

Kameerath Kareem describes a few basic visualizations and explains when you might use them:

Cumulative distribution graph is a commonly used chart type to express the performance metrics in percentile; it plots the percent of users who had performance metric greater or lesser than the threshold for the website.

The graph below shows the CDF graph for web page response time

From the CDF graph above, we see that at the 90th percentile, the web page response time of a website is 10.3 seconds. This means that 10% of the users in the time frame that the data was collected in had an overall web page load time of more than 10.3 seconds.

These are metrics as they relate to systems operations, but the general rules apply elsewhere as well.  Also, 10.3 seconds to load a webpage seems…slow.

Comments closed

Dot-Density Maps In R

Paul Campbell shows how to build a dot density map in R:

To get me started I invested in the expert guidance of data-visualiser-extraordinaire Nathan Yau, aka Flowing Data. Nathan has a whole host of tutorials on how to make really great visualisations in R (including a brand new course focused on mapping) and thankfully one of them deals with how to plot dot density using base R.

Now with a better understanding of the task at hand, I needed to find the required ethnicity data and shapefiles. I recently saw a video of Amelia McNamara’s great talk at the OpenVis Conference titled ‘How spatial polygons shape our world’. The .shp file really is a glorious thing and it seems that the spatial polygon makers are the unsung heros of the datavis world, so a big round of applause for all those guys is in order.

Anyway, I digress. Luckily for me, the good folks over at the London DataStore have a vast array of Shapefiles that go from Borough level all the way down to Super Output Area level. I’m going to use the Output Areas as the boundaries for the dots and the much broader Borough boundaries for ploting area labels and borders.

The ethnic group data itself was sourced from the Nomis website which has a handy 2011 Census table finder tool where you can easily download an Ethnic Group csv file for London output areas. Vamonos.

I’m going to give this a second reading; it’s a great example of how to go from functional to beautiful.  H/T David Smith

Comments closed

Creating Nicer Reports

Reid Havens has a few tips for making Power BI reports look nicer:

This is less of a single applied step as it is multiple formatting practices applied throughout the report. I’ve already hit on this subject a little bit in the two previous Power BI visual design practices in regards to using complimentary colors. The two key takeaways in this section are object formatting and color coordination.

Of all my best practices I’m showcasing here I’d say this one is the most subjective. However I think that maintaining complimentary colors goes a long ways to creating a professional looking report. I also have a strong dislike for the default title design for visualizations in Power BI. By default it is left aligned and a grey color (AGAIN…hard to read!). I center that sucker and color the background. An added benefit to coloring the title background is it actually forces me to make sure my objects are aligned, otherwise it is VERY noticeable now if they aren’t.

Definitely read the comments on this one, as some of these tips are subjective.

Comments closed

Watch Those Aggregates

Ronald Dameron warns us about graphs in the Azure portal:

I would expect to see the CPU spike with the same value no matter what time range I selected. But, to see the spike that fired the alert, I had to to “Edit” the chart and select different time ranges to see the differences. It wasn’t until I selected a narrow custom time range that the CPU graph would display the CPU spike that corresponded to the alert firing. The alert fires if the CPU percentage exceeds 80% over 15 minutes. So, if you “know” something happened, try different time ranges but especially the custom range to find what you are looking for.

For your highbrow reference of the quarter, FA Hayek on John Maynard Keynes’s Treatise of Money:  “Mr. Keynes’s aggregates conceal the most fundamental mechanisms of change.”

This isn’t an Azure-specific problem; it’s something we have to think about whenever we aggregate data.

Comments closed

Play Axis Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Play Axis Power BI Custom Visual.  The Play Axis visual works like a dynamic slicer that animates your other report visuals without needing to click every time you want to change your filter value.

This is a valuable custom visual when dealing with time series data, but as Devin shows, you can iterate through other sets, like a set of employee names.

Comments closed

Subplots In Maps

Ilya Kashnitsky shows how to embed subplots within a map using ggplot2:

So, with this map I want to show the location of more and less urbanized NUTS-2 regions of Europe. But I also want to show – with subplots – how I defined the three subregions of Europe (Eastern, Southern, and Western) and what is the relative frequency of the three categories of regions (Predominantly Rural, Intermediate, and Predominantly Rural) within each of the subregions. The logic of actions is simple: first prepare all the components, then assemble them in a composite plot. Let’s go!

This is very useful information, well worth the read.

Comments closed

Spark Changes In HDP 2.6

Vinay Shukla and Syed Mahmood talk about what’s new with Spark and Zeppelin in the Hortonworks Data Platform 2.6 update:

SPARKR & PYSPARK

Most data scientists use R & Python and with SparkR & PySpark respectively they can continue to leverage their familiarity with the R & Python languages. However, they need to use the Spark API to leverage Machine learning with Spark and to take advantage of distributed computations. Both SparkR & PySpark are evolving rapidly and SparkR now supports a number of machine learning algorithms such as LDA, ALS, RF, GMM GBT etc. Another key improvement in SparkR is the ability to deploy a package interactively. This will help Data Scientists deploy their favorite R package in their own environment without stepping on other users.

PySpark now also supports deploying VirtualEnv and this will allow PySpark users to deploy their libraries in their own individual deployments.

There are several large changes, so check it out.

Comments closed

Custom ggplot2 Subplots

Ilya Kashnitsky shows how to create custom subplots using ggplot2:

Actually, ggplot2 is a very powerful and flexible tool that allows to draw figures with quite a complex layout. Today I want to show the code that aligns six square plots (actually, maps) just as in the figure above. And it’s all about the handy function ggplot2::annotation_custom(). Since I used the layout more than once, I wrapped the code that produced it into a function that takes a list of 6 square plots as an input and yields the arranged figure with arrows as an output. Here is the commented code of the function.

This is the difference between “I’m just going to throw some stuff on there” (which is how I tend to operate) versus well thought out visual layout.

Comments closed

Building A Concatenated Tooltip In Power BI

Devin Knight has started a new series, walking through problems his clients have faced implementing Power BI solutions.  In this edition, Devin wants to build a comma-delimited list to display on a tooltip:

This works perfectly for Stock because it automatically summarizes the value but, you’ll notice above that the tooltip for Subcategory has an interesting behavior. Rather than displaying the list of the values in Subcategory it actually just show the very first value. This happens because the Tooltip field requires that any column used in it be able to aggregate or roll up the values into what’s shown on the chart. Since Subcategory is just a text field Power BI automatically applies the FIRST function to return back the first value that appears. You could optionally change this from FIRST to either LAST, COUNT, or COUNTDISTINCT.

So the real problem I want to solve here is rather than only showing the first subcategory how do I list all the subcategories in a comma separated list in the tooltip? Let’s walk through a couple possible designs to this solution.

Read on for two different designs, including the code to implement the solutions.

Comments closed