R – Page 72 – Curated SQL

United States Maps in R

Published 2019-12-02 by Kevin Feasel

Laura Ellis shows how to use the usmap package in R:

Today, I’d like to share the package ‘usmap’ which enables incredibly easy and fast creation of US maps in R.
In honor of US Thanksgiving tomorrow, I’m going to make this blog Thanksgiving themed! In this tutorial, we will use the gTrendsR package to pull US Google search results on the keyword “thanksgiving” and plot the popularity by state.

Click through for that demo, as well as links to more demos on map usage.

Comments closed

Fun With Waffle Plots

Published 2019-11-25 by Kevin Feasel

Sebastian Sauer has a two-parter on waffle plots. The first part is an introduction:

A waffle diagram is a variant of (stacked) bar plots or pie plots. They do not have great perceptual properties, I’d suspect, but for some purposes they may be adequate. This is best explored by example. This post draws heavily from the introduction of hrbrmstr to his Waffle package.

The second part uses emojifont to show pictograms as well:

A Pictogram may be defined as a (statistical) diagram using icons or similar “iconic” graphics to illstrate stuff. The waffle plot (see this post) is a nice object where to combine waffle and pictorgrams. Originally, this post was inspired by HRBRMSTR waffle package, see this post, but I could not get it running.
Maybe the easiest way is to work through an example (spoiler: see below for what we’re heading at).

This type of plot doesn’t work for everything, but I can think of a few places where it’d be the right choice.

Comments closed

Debugging Code in R

Published 2019-11-20 by Kevin Feasel

Marina Wyss walks us through debugging techniques in R:

There are many ways to approach these problems when they arise. For example, condition handling using tools like try(), tryCatch(), and withCallingHandlers() can increase your code’s robustness by proactively steering error handling.
R also includes several advanced debugging tools that can be very helpful for quickly and efficiently locating problems, which will be the focus of this article. To illustrate, we’ll use an example adapted from an excellent paper by Roger D. Peng, and show how these tools work along with some updated ways to interact with them via RStudio. In addition to working with errors, the debugging tools can also be used on warnings by converting them to errors via options(warn = 2).

Read on for a survey of what’s available in R. It’s a lot more than writing a bunch of print statements. H/T R-Bloggers

Comments closed

Using pdqr for Statistical Uncertainty

Published 2019-11-14 by Kevin Feasel

Evgeni Chasnovski has a new CRAN package:

I am glad to announce that my latest, long written R package ‘pdqr’ is accepted to CRAN. It provides tools for creating, transforming and summarizing custom random variables with distribution functions (as base R ‘p*()’, ‘d*()’, ‘q*()’, and ‘r*()’ functions). You can read a brief overview in one of my previous posts.

Click through for a description of the package.

Comments closed

Important Assumptions with Linear Models

Published 2019-11-13 by Kevin Feasel

Sebastian Sauer takes us through two of the most important assumptions of linear models:

Additivity and linearity as the second most important assumptions in linear models
We assume that \(y\) is a linear function of the predictors. If y is not a linear function of the predictors, we cannot expect the model to deliver correct insights (predictions, causal coefficients). Let’s check an example.

Read on to understand what this means, as well as the most important assumption.

Comments closed

Updates to AzureR Packages

Published 2019-11-13 by Kevin Feasel

Hong Ooi announces changes to several AzureR packages:

AzureVM 2.1.0
You can now create VM scalesets with attached data disks. In addition, you can specify the disk type (Standard_LRS, StandardSSD_LRS, or Premium_LRS) for the OS disk and, for a Linux Data Science Virtual Machine, the supplied data disk. This enables using VM sizes that don’t support Premium storage.

Click through for the full set of updates.

Comments closed

Merging Datasets in R with the Tidyverse

Published 2019-11-04 by Kevin Feasel

Anisa Dhana shows off several tidyverse methods for combining data sets together:

semi_join
The semi_join function is different than the previous examples of joins. A semi join creates a new dataset in which there are all rows from the data1 where there is a corresponding matching value in data2. Still, instead of the final dataset merging both the first (data1) and second (data2) datasets, it only contains the variables from the first one (data1).

Most of this looks like standard SQL joins, but read through to the end for a bonus which doesn’t typically appear in relational database products.

Comments closed

Mocking Objects with R

Published 2019-10-31 by Kevin Feasel

The R-hub blog has an interesting post on creating mocks in R for unit testing:

In some of these cases, the programming concept you’re after is mocking, i.e. making a function act as if something were a certain way! In this blog post we shall offer a round-up of resources around mocking, or not mocking, when unit testing an R package.

It’s interesting watching data scientists work through the same sorts of problems which traditional developers have hit, whether that be testing, deployment, or source control management. H/T R-bloggers

Comments closed

Plotting Three-Dimensional Linear Models

Published 2019-10-29 by Kevin Feasel

Sebastian Sauer shows a few techniques for visualizing linear models with two predictors:

Linear models are a standard way of predicting or explaining some data. Visualizing data is not only of didactical value but provides heuristical value too, as demonstrated by Anscombe’s Quartet.
Visualizing linear models in 2D is straightforward, but visualizing linear models with more than one predictor is much less so. The aim of this post is to demonstrate some ways do visualize linear models with more than one predictor, using popular R packages. We will focus on 3D examples, that is, two predictors.

I have a strong bias against 3D visuals because they tend to be so difficult to see clearly. There are times when they’re necessary, though.

Comments closed

Re-Introducing rquery

Published 2019-10-28 by Kevin Feasel

John Mount has a new introduction to rquery:

rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves follow the selections in Codd’s relational algebra, with the addition of the traditional SQL “window functions.” More on the background and context of rquery can be found here.
The R/rquery version of this introduction is here, and the Python/data_algebra version of this introduction is here.

Check it out.

Comments closed

Category: R