Press "Enter" to skip to content

Category: Visualization

Network And Sankey Diagrams In Python And R

Tony Hirst has a roundup of various R and Python packages which build network charts or Sankey diagrams:

Another way we might be able to look at the data “out of time” to show flow between modules is to use a Sankey diagram that allows for the possibility of feedback loops.

The Python sankeyview package (described in Hybrid Sankey diagrams: Visual analysis of multidimensional data for understanding resource use looks like it could be useful here, if I can work out how to do the set-up correctly!

Sankey diagrams are on my list of dangerous visuals:  done right, they are informative, but it’s easy to try to put too much into the diagram and thereby confuse everybody.

Comments closed

“Caveman” Graphs In SQL

Denis Gobo puts together some basic Management Studio data visualization:

I found this technique on Rich Benner’s SQL Server Blog: Visualising the Marvel Cinematic Universe in T-SQL and decided to play around with it after someone asked me to give him the sizes of all databases on a development instance of SQL Server

The way it works is that you take the size of the database and then divide that number against the total size of all databases. You then use the replicate function with the | (pipe) character to generate the ‘graph’  so 8% will look like this ||||||||

You can use this for tables with most rows, a count per state etc etc. By looking at the output the graph column adds a nice visual effect to it IMHO

It does the job and doesn’t require you to go out to a different product, so it works pretty well for occasional administrative queries.

Comments closed

Conditional Formatting With Power BI Line Charts

Daniil Maslyuk shows how to perform conditional formatting on a line chart in Power BI:

Have you ever wished you could change the line colour depending on the overall trend? For example, if your sales increase over time, the line is green; if there is a decline, then the line is red. While this functionality is not yet natively available in Power BI Desktop, it does not mean this cannot be done! In this article, I am going to show you how to achieve this effect.

Read on to see how he does it.

Comments closed

Getting Started With Zeppelin

Sangeeta Gulia shows us how to get started building notebooks with Apache Zeppelin on top of Spark:

There are 3 interpreter modes available in Zeppelin.

1) Shared Mode

In Shared mode, a SparkContext and a Scala REPL is being shared among all interpreters in the group. So every Note will be sharing single SparkContext and single Scala REPL. In this mode, if NoteA defines variable ‘a’ then NoteB not only able to read variable ‘a’ but also able to override the variable.

2) Scoped Mode

In Scoped mode, each Note has its own Scala REPL. So variable defined in a Note can not be read or overridden in another Note. However, still single SparkContext serves all the Interpreter Groups. And all the jobs are submitted to this SparkContext and fair scheduler schedules the job. This could be useful when user does not want to share Scala session, but want to keep single Spark application and leverage its fair scheduler.

3) Isolated Mode

In Isolated mode, each Note has its own SparkContext and Scala REPL.

The default mode of %spark interpreter is ‘Globally Shared’.

This is mostly a step-by-step on installing Zeppelin, but does go into some detail on how Zeppelin works.

Comments closed

Stack Shuffle Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Enlighten Stack Shuffle Custom Visual.  The Enlighten Stack Shuffle is helpful when you want to display a Top N set of values.  For example if you want to display your top 5 selling employees this visual can make that very easy.

This looks pretty good on a dashboard, especially if you have a top-heavy data set, where the top few items are by far the most important.

Comments closed

Making a Shiny Dashboard

Anish Sing Walia walks us through creating a dashboard using Shiny:

Shiny is an amazing R package which lets the R developers and users build amazing web apps using R itself. It lets the R users analyze, visualize and deploy their machine learning models directly in the form of the web app. This package lets you host standalone apps on a webpage or embed them in R markdown documents or build dashboards and various forecasting applications. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions. Shiny lets us write client-side front-end code in R itself and also lets users write server-side script in R itself. More details on this package can be found here.

I recently learned Shiny and started developing a web application using it.And since then I have been in love with it and have been using it in each and every data science and analytics project. The syntax is super easy to understand and there are lots of amazing articles and documentation available for you to learn it and use it. I personally had a background of developing full-stack web applications using HTML, CSS and javascript and other JS based scripting languages so I found the syntax easy.

I keep meaning to learn Shiny and someday I will, just to prove to my intern that she’s not the only one here who can…

1 Comment

ggplot2 Basics

Bharani Akella has an introduction to ggplot2:

Plot10: Scatter-plot

ggplot(data = mtcars,aes(x=mpg,y=hp,col=factor(cyl)))+geom_point()
  • mpg(miles/galloon) is assigned to the x-axis

  • hp(Horsepower) is assigned to the y-axis

  • factor(cyl) {Number of cylinders} determines the color

  • The geometry used is scatter plot. We can create a scatter plot by using the geom_point() function.

He has a number of similar examples showing several variations on bar, line, and scatterplot charts.

Comments closed