Press "Enter" to skip to content

Category: Visualization

Running Dask on AKS

Tsuyoshi Matsuzaki sets up Dask as a distributed service:

In my last post, I showed you tutorial for running Apache Spark on managed kubernetes, Azure Kubernetes Service (AKS).
In this post, I’ll show you the tutorial for running distributed workloads of Dask on AKS.

By using Dask, you can run Scikit-Learn compliant functions and jobs for data which cannot fit in memory, or run in distributed manners. For simplicity, here I’ll use built-in Dask ML function (dask_ml.linear_model.LinearRegression) in this tutorial. (With the same manners, you can also run regular sklearn functions.)
Cloud managed kubernetes will make you speed up this large ML workloads.

Click through for the process. I’ve had some positive experiences with Dask as a dashboarding tool. It’s definitely one of the better ones if you’re big into Python.

Comments closed

The Value of Bubble Charts

Elizabeth Ricks takes us through a surprisingly tricky chart:

An extension of a scatterplot, a bubble chart is commonly used to visualize relationships between three or more numeric variables.  Each bubble in a chart represents a single data point. The values for each bubble are encoded by 1) its horizontal position on the x-axis, 2) its vertical position on the y-axis, and 3) the size of the bubble. Sometimes, the color of the bubble or its movement in animation can represent more dimensions. 

I say surprisingly tricky because it’s easy to overwhelm the user when trying to view bubble charts. I think the best scenarios are cases in which you have relatively few data points and the size element is mandatory.

Hans Rosling (RIP) did an outstanding job of displaying this kind of chart with the Gapminder dataset.

Comments closed

Preventing Calendar Overrun in Power BI

Matt Allington updates an older article:

Consider the example below where the CalendarYear is filtered for 2019 and the values of the measures Total Sales and Total Sales YTD are displayed by month. As you can see, the total sales are shown up to July 2019. This is because with the sample data, the last sales date is somewhere in July 2019. However, the values of Total Sales YTD are repeated all the way until the end of year (July 2019 to December 2019). This is what I call Calendar Over Run. It is common to want to prevent this overrun.

Read on for two separate methods of preventing this visual issue.

Comments closed

Plotting Correlation Analyses in R

Finnstats shows a few techniques for plotting correlation in R:

Correlation analysis, correlation is a term that is a measure of the strength of a relationship between two variables.

Pearson’s Product-Moment Correlation

One of the most common measures of correlation is Pearson’s product-moment correlation, which is commonly referred to simply as the correlation, or just the letter r.

Correlation shows the strength of a relationship between two variables and is expressed numerically by the correlation coefficient.

Click through for examples from several packages. H/T R-Bloggers.

Comments closed

Table Design in R with mmtable2

Matt Dancho walks through a package to make tables look great in R:

I love ggplot2 for plotting. The grammar of graphics allows us to add elements to plots. Tables seem to be forgotten in terms of an intuitive grammar with tidy data philosophy – Until now. mmtable2 aims to be the ggplot2 for tables, leveraging the awesome GT table package.

The mmtable2 package aims to make it easy to create tables by:

1. Using a ggplot2-style syntax for using a grammar of table operations.

2. Extends the amazing GT table package.

Read on for the process and a demonstration.

Comments closed

Plotting XGBoost Trees with R

Andrew Treadway shows off a method to visualize the results of training an XGBoost model:

In this post, we’re going to cover how to plot XGBoost trees in R. XGBoost is a very popular machine learning algorithm, which is frequently used in Kaggle competitions and has many practical use cases.

Let’s start by loading the packages we’ll need. Note that plotting XGBoost trees requires the DiagrammeR package to be installed, so even if you have xgboost installed already, you’ll need to make sure you have DiagrammeR also.

Click through for the process. H/T R-Bloggers.

Comments closed

Grafana Changing License

Alex Woodie has some bad news for us:

Grafana is switching licensing of its core products from Apache License 2.0 to the more restrictive Affero General Public License (GPL) v3. The company made the change in an attempt to balance the value of open source with Grafana’s monetization strategy, CEO Raj Dutt announced yesterday.

Grafana has been considering a license change for some time, Dutt wrote in a blog post on April 20. This week, the company finally felt the time was right to move.

“Oof” was my first response. I know that a pretty large percentage of companies won’t touch AGPL. I don’t know if we’ll see these companies adopt the commercial version of Grafana, see the companies switch over to something else, or see developers fork Grafana and come up with some other product. AGPL is not quite as scary for companies when a product is at the end of the chain, as visualization and dashboarding products tend to be, but for many companies, that doesn’t matter.

Comments closed

Displaying Metrics on Graphite Dashboards

Nick Campion takes us through working with Graphite:

Graphite is a free and open-source software. It is used as a time-series database monitoring tool, where you can collect, store and display time-series data in real-time. As you can monitor certain metrics of this data using Graphite, it has a very useful and simple dashboard used to visualize these metrics.

This article will show you how to display a metric on your Graphite dashboard.

Click through for more information.

Comments closed

Creating a Rose Chart in R

Neil Saunders takes a look at a classic chart:

I first heard Florence Nightingale and her Geeks Declare War on Death, an episode of the Cautionary Tales podcast, premiered as a special episode of 99% Invisible. It discusses Nightingale’s work as a statistician and in particular, her visualisation of mortality causes in the Crimean War using the famous “rose chart”, or polar area diagram.

I’m sure you’re thinking: how can I explore that using R? 

Read on to find out.

Comments closed