Grouping And Aggregating In SQL, R, And Python

Dejan Sarka has a few examples of aggregation in different languages, including SQL, R, and Python:

The query calculates the coefficient of variation (defined as the standard deviation divided the mean) for the following groups, in the order as they are listed in the GROUPING SETS clause:

  • Country and education – expression (g.EnglishCountryRegionName, c.EnglishEducation)
  • Country only – expression (g.EnglishCountryRegionName)
  • Education only – expression (c.EnglishEducation)
  • Over all dataset- expression ()

Note also the usage of the GROUPING() function in the query. This function tells you whether the NULL in a cell comes because there were NULLs in the source data and this means a group NULL, or there is a NULL in the cell because this is a hyper aggregate. For example, NULL in the Education column where the value of the GROUPING(Education) equals to 1 indicates that this is aggregated in such a way that education makes no sense in the context, for example aggregated over countries only, or over the whole dataset. I used ordering by NEWID() just to shuffle the results. I executed query multiple times before I got the desired order where all possibilities for the GROUPING() function output were included in the first few rows of the result set. Here is the result.

GROUPING SETS is an underappreciated bit of SQL syntax.

Related Posts

Building TensorFlow Neural Networks On Spark With Keras

Jules Damji has an example of using the PyCharm IDE to use Keras to build TensorFlow neural network models on the Databricks MLflow library: Our example in the video is a simple Keras network, modified from Keras Model Examples, that creates a simple multi-layer binary classification model with a couple of hidden and dropout layers and […]

Read More

Scatterplots For Multivariate Analysis

Neil Saunders declutters a complicated visual with a simple scatterplot: Sydney’s congestion at ‘tipping point’ blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes. Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

July 2018
MTWTFSS
« Jun  
 1
2345678
9101112131415
16171819202122
23242526272829
3031