Press "Enter" to skip to content

Category: Visualization

Apache Zeppelin 0.7.0

Vinay Shulka announces Apache Zeppelin 0.7.0:

SPARK IMPROVEMENTS

This release also adds support for Spark 2 including version Spark 2.1. Zeppelin now also links to Spark History Server UI from Zeppelin so users can more easily track Spark jobs. The Livy interpreter now supports specifying packages with the job.

SECURITY IMPROVEMENTS

The major security improvement in Zeppelin 0.7.0 is using Apache Knox’s LDAP Realm to connect to LDAP. Zeppelin home page now lists only the nodes to which the user is authorized to access. Zeppelin now also has the ability to support PAM based authentication.

The full list of improvements is available here

This visualization platform is growing up nicely.

Comments closed

Dual KPI Custom Visual

Adam Saxton has a quick video demonstrating a dual KPI custom visual:

The Dual KPI efficiently visualizes two measures over time. It shows their trend based on a joint timeline, while absolute values may use different scales, for example Profit and Market share or Sales and Profit.

Each KPI can be visualized as line chart or area chart. The visual has dynamic behavior and can show historical value and the change from the latest value when you hover over it. It also has small icons and labels to convey KPI definitions and alerts about data freshness.

I looks cool, but I dunno; my philosophy is that man cannot serve two KPIs.

Comments closed

Gap Analysis Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Gap Analysis Power BI Custom Visual.  The Gap Analysis visual is used to analyze the difference between two different groups of data you have.  For example, you might use it to analyze the gap between two answers people gave in survey response data.

I like the gap analysis visual; it works well as a cross-category comparison visual, giving you an idea of the relative importance of each category as well as the change from one time period to the next.  It’s a good way of fitting two useful pieces of information into the same visual.

Comments closed

Mapping Geospatial Data

The folks at Sharp Sight Labs have a great blog post on mapping geospatial data using R:

If you’ve learned the basics of data visualization in R (namely, ggplot2) and you’re interested in geospatial visualization, use this as a small, narrowly-defined exercize to practice some intermediate skills.

There are at least three things that you can learn and practice with this visualization:

  1. Learn about color: Part of what makes this visualization compelling are the colors. Notice that in the area surrounding the US, we’re not using pure black, but a dark grey. For the title, we’re not using white, but a medium grey. Also, notice that for the rivers, we’re not using “blue” but a very specific hexadecimal color. These are all deliberate choices. As an exercise, I highly recommend modifying the colors. Play around a bit and see how changing the colors changes the “feel” of the visualization.

  2. Learn to build visualizations in layers: I’ve emphasized this several times recently, but layering is an important principle of data visualization. Notice that we’re layering the river data over the USA country map. As an exercise, you could also layer in the state boundaries between the country map and the rivers. To do this, you can use map_data().

  3. Learn about ‘Spatial’ data: R has several classes for dealing with ‘geospatial’ data, such as ‘SpatialLines‘, ‘SpatialPoints‘, and others. Spatial data is a whole different animal, so you’ll have to learn its structure. This example will give you a little experience dealing with it.

I also like the iterative approach they discuss.  You’ll almost never get it right the first go-around, but one of the nice things about ggplot2 is that it’s designed to be iterative:  you layer your changes on, making it a bit easier to fiddle with them to get the visualization just right.

Comments closed

Using Sparklyr To Analyze Flight Data

Aki Ariga uses sparklyr on Apache Spark 2.0 to analyze flight data living in S3:

Using sparklyr enables you to analyze big data on Amazon S3 with R smoothly. You can build a Spark cluster easily with Cloudera Director. sparklyr makes Spark as a backend database of dplyr. You can create tidy data from huge messy data, plot complex maps from this big data the same way as small data, and build a predictive model from big data with MLlib. I believe sparklyr helps all R users perform exploratory data analysis faster and easier on large-scale data. Let’s try!

You can see the Rmarkdown of this analysis on RPubs. With RStudio, you can share Rmarkdown easily on RPubs.

Sparklyr is an exciting technology for distributed data analysis.

Comments closed

Superheat

David Smith shows off a very cool heatmap package called superheat:

While the superheat pacakge uses the ggplot2 package internally, it doesn’t itself follow the grammar of graphics paradigm: the function is more like a traditional base R graphics function with a couple of dozen options, and it creates a plot directly rather than returning a ggplot2 object that can be further customized. But as long as the options cover your heatmap needs (and that’s likely), you should find it a useful tool next time you need to represent data on a grid.

The superheat package apparently works with any R version after 3.1 (and I can confirm it works on the most recent, R 3.3.2). This arXiv paper provides some details and several case studies, and you can find more examples here. Check out the vignette for detailed usage instructions, and download it from its GitHub repository linked below.

Click through for some great-looking examples.

Comments closed

R Visuals In Power BI

Ryan Wade ties ggplot2 visuals into Power BI:

The package that we are going to use to develop our custom visualization is ggplot2. The ggplot2 package is arguably the most popular data visualization package in R. It is based on the “grammar of graphics” concept that was created by the statistician, Leland Wilkinson. The ggplot2 package allows you to approach creating charts and graphs in the same manner that Bob Ross approached painting trees in the forest. With ggplot2 you are able to start with a blank canvas and add layers upon layers via short code snippets that builds on each other until you end up with the desired visualization.

The pbix file that is being used in this blog can be found here: http://bit.ly/2jwoCyP. The GentleIntroToR_ChartExample.pbix file contains an example of using R to create a box plot chart that shows the distribution of player scores for the L.A. Lakers. Chiclet slicers were added that allows you to filter by division and/or opponent. The R visualization was created in four steps.

Check out the PBIX file.

Comments closed

Animating Visuals In R

Tomaz Kastrun shows how to create animated charts in R using ggplot2:

In addition to R code, the ImageMagic program needs to be installed on your machine, as well. Also the speed, quality and many other parameters can be set, when creating animated gif.

Animated gif can be also included into your SSRS report, your Sharepoint site or any other site – like my blog 🙂 and it will stay interactive. In Power BI, importing animated gif as a picture, unfortunately will not work.

Be very careful with this, as not everything supports animated GIFs and you can make some really painful graphs if you try hard enough…

Comments closed

Sankey Bar Charts

Devin Knight continues his custom visuals series:

In this module you will learn how to use the Sankey Bar Chart Power BI Custom Visual.  The Sankey Bar Chart is used to show a flow of data between different stages of a process.

It’s an interesting mix of sankey, bar chart, and funnel.  In other words, you may only have one thing you can use it for, but it’ll be a really good use.

Comments closed