Press "Enter" to skip to content

Category: Visualization

Creating Map Plots With ggmap

Laura Ellis shows how to use the ggmap package to create choropleth maps in R:

In the last map, it was a bit tricky to see the density of the incidents because all the graphed points were sitting on top of each other.  In this scenario, we are going to make the data all one color and we are going to set the alpha variable which will make the dots transparent.  This helps display the density of points plotted.

Also note, we can re-use the base map created in the first step “p” to plot the new map.

Check it out.  This is an introduction to creating choropleths, making it a good start.

Comments closed

Faceting With R And SQL Server ML Services

Marlon Ribunal has a quick example showing how to build faceted plots with SQL Server ML Services and ggplot2:

In my previous post, I have demonstrated how easy it is to create a bar graph in SQL Server 2017 In-Database Machine Learning using  R.

We’re going to build upon that basic graph.

Sometimes doing data analysis would require us to look at an overview of our data across specific partitions, say a year. For example, we want to see how our product groups fare on month-to-month basis across the last 4 years.

In a data analytics perspective, there are quite a handful of data points in this requirement – data aggregate (quantity), monthly periods, and year partitions.

One of the approaches to handle such requirement is by using a facet. Faceting is a way of plotting subsets of data into a matrix of panels based on one or more variables – or facets.

Click through for the example and code.  Facets are quite useful, but they run the risk of misleading if you squeeze too many onto the screen.  The same line can look quite different with a “tall” facet versus a “wide” facet, and that can change how people interpret your visual.

Comments closed

Building Forest Plots With ggplot2

Faisal Atakora shows how to build a forest plot using ggplot2:

To build a Forest Plot often the forestplot package is used in R. However, I find the ggplot2 to have more advantages in making Forest Plots, such as enable inclusion of several variables with many categories in a lattice form. You can also use any scale of your choice such as log scale etc. In this post, I will introduce how to plot Risk Ratios and their Confidence Intervals of several conditions.

Click through for the script.  You might also want to compare it to the forestplot package to see how these differ.

Comments closed

Dynamically Showing Or Hiding Columns In SSRS With Parameters

Sander Stad shows how to show or hide columns at runtime in SQL Server Reporting Services reports using parameters:

Regularly I have reports that have an extensive amount of columns.
Because the amount of columns, reports tend to become inefficient and have too much information we don’t always need. The users may want to select certain columns and to make the report easier to read.

Hiding and showing columns in SSRS reports using parameters is a solution to make reports more dynamic and easier to use.

At the time of writing of this article, SQL Server Reporting Services did not yet have the possibility to use checkbox parameters. Instead we will be using a multi-value text parameter to show or hide our columns.

Click through to see how to do this.

Comments closed

Line Drawing And The Traveling Salesman Problem

Antonio Sanchez Chinchon builds a shortest-path portrait generator:

In this experiment I apply an heuristic algorithm to solve the TSP to draw a portrait. The idea is pretty simple:

  • Load a photo

  • Convert it to black and white

  • Choose a sample of black points

  • Solve the TSP to calculate a route among the points

  • Plot the route

Click through for the code.  This is an interesting application of the traveling salesman problem.

Comments closed

Contrasting Plotly And Seaborn

Natasha Sharma contrasts the Seaborn and Plotly libraries for visualizing data:

It was important to use a library which can provide easy and high-class interactivity. Before embedding the plots into my website code, I tested a few different libraries like Matplotlib and Seaborn in order to visualize the results and to see how different they can look. After few trials, I came across Plotly library and found it valuable for my project because of its inbuilt functionality which gives user a high class interactivity.

In this post, I am going to compare Seaborn and Plotly using – Bar Chart and Heatmap diagram. I will be using Breast cancer dataset to visualize these plots. But before jumping into the comparison, the dataset I used needed preprocessing like data cleaning so, I followed steps.

In this case, the contrast is mostly lines of code versus visual quality; read on for more.

Comments closed

Visualizing Geo-Spatial Data In R

Carson Sievert shows off the plotly library:

You might be wondering, “What can plotly offer over other interactive mapping packages such as leafletmapviewmapedit, etc?”. One big feature is the linked brushing framework, which works best when linking plotly together with other plotly graphs (i.e., only a subset of brushing features are supported when linking to other crosstalk-compatible htmlwidgets). Another is the ability to leverage the plotly.js API to make efficient updates in shiny apps via plotlyProxy(). Speaking of efficiency, plotly.js keeps on improving the performance of their WebGL-based rendering, so I recommend trying plot_ly() (with toWebGL()) and/or plot_mapbox() if you have lots of graphical elements to render. Also, by having a consistent interface between these various mapping approaches, it’s much quicker and easier to switch from one approach to another when you need to leverage a different set of strengths and weaknesses.

Plotly’s on my list of things I’ll eventually get to one of these days.  H/T R-Bloggers

Comments closed

Plotting In R Using ggplot2

The folks at Sharp Sight Labs have another nice demo of ggplot2:

You’ve heard me say it a thousand times: to master data science, you need to practice.

You need to “practice small” by practicing individual techniques and functions. But you also need to “practice big” by working on larger projects.

To get some practice, my recommendation is to find reasonably sized datasets online and plot them.

Wikipedia is a nearly-endless source of good datasets. The great thing about Wikipedia is that many of the datasets are small and well contained. They are also fairly clean, with just enough messiness to make them a bit of a challenge.

As a quick example, this week, we’ll plot some economic data.

The code is deceptively easy considering the scope of the problem.

Comments closed

ggplot2 Geoms And Aesthetics

Tyler Rinker digs into ggplot2’s geoms and aesthetics:

I thought it my be fun to use the geoms aesthetics to see if we could cluster aesthetically similar geoms closer together. The heatmap below uses cosine similarity and heirarchical clustering to reorder the matrix that will allow for like geoms to be found closer to one another (note that today I learned from “R for Data Science” about the seriation package [https://cran.r-project.org/web/packages/seriation/index.html] that may make this matrix reordering task much easier).

It’s an interesting analysis of what’s available within ggplot2 and a detailed look at how different geoms fit together with respect to aesthetic options.

Comments closed

Defending Pie Charts

Bobby Johnson makes a valiant effort at defending the indefensible:

In the world of data analysis, there are few things more reviled than the pie chart. Among “serious” data people, it is at best trivial and naive, and at worst downright evil.

I do not agree with this. The pie chart is simple, but that is its beauty. It does exactly one thing and it does it well: it shows you how much different parts contribute to a whole. This isn’t the only question you ever have about your data, but when it’s the question you do have, the pie chart is perfect. That is not evil and it is not naive. It is data visualization doing what it should: taking something large and abstract and saying something simple about it that your brain can easily internalize.

I strongly disagree with arguments in the article, but do respect the attempt.  In each of the cases, at least one of a bar chart, stacked 100% bar chart, or dot plot could give at least the same amount of information with less lower mental overhead.

Comments closed