Press "Enter" to skip to content

Category: R

Reviewing Word Associations With R

Julia Silge does some exploratory analysis on the Small World of Words project:

The Small World of Words project focuses on word associations. You can try it out for yourself to see how it works, but the general idea is that the participant is presented with a word (from “telephone” to “journalist” to “yoga”) and is then asked to give their immediate association with that word. The project has collected more than 15 million responses to date, and is still collecting data. You can check out some pre-built visualizations the researchers have put together to explore the dataset, or you can download the data for yourself.

It’s an interesting analysis of the data set, mixed in with some good R code.

Comments closed

Deploying An R Service To Azure Kubernetes Service

Hong Ooi shows us how we can use Azure Container Registry and Azure Kubernetes Service to deploy an R model via Plumber:

If you run this code, you should see a lot of output indicating that R is downloading, compiling and installing randomForest, and finally that the image is being pushed to Azure. (You will see this output even if your machine already has the randomForest package installed. This is because the package is being installed to the R session inside the container, which is distinct from the one running the code shown here.)
All docker calls in AzureContainers, like the one to build the image, return the actual docker commandline as the cmdline attribute of the (invisible) returned value. In this case, the commandline is docker build -t bos_rf . Similarly, the push() method actually involves two Docker calls, one to retag the image, and the second to do the actual pushing; the returned value in this case will be a 2-component list with the command lines being docker tag bos_rf deployreg.azurecr.io/bos_rf and docker push deployreg.azurecr.io/bos_rf.

I love this confluence of technologies and at the same time get a “descent into madness” feeling from the sheer number of worlds colliding.

Comments closed

Road Construction Incentive Contracts And R

Sebastian Kranz promotes an interesting RTutor project:

Patrick Bajari and Gregory Lewis have collected a detailed sample of 466 road construction projects in Minnesota to study this question in their very interesting article Moral Hazard, Incentive Contracts and Risk: Evidence from Procurement in the Review of Economic Studies, 2014.
They estimate a structural econometric model and find that changes in contract design could substantially reduce the duration of road blockages and largely increase total welfare at only minor increases in the risk that road construction firms face.
As part of his Master Thesis at Ulm University, Claudius Schmid has generated a nice and detailed RTutor problem set that allows you to replicate the findings in an interactive fashion. You learn a lot about the structure and outcomes of the currently used contracts, the theory behind better contract design and how the structural model to assess the quantitative effects can be estimated and simulated. At the same time, you can hone your general data science and R skills.

Click through to a couple of ways to get to this RTutor project and learn a bit about building incentive contracts to modify behavior.  H/T R-Bloggers

Comments closed

Analyzing Customer Churn With Keras And H2O

Shirin Glander has released code pertaining to a forthcoming book chapter:

This is code that accompanies a book chapter on customer churn that I have written for the German dpunkt Verlag. The book is in German and will probably appear in February: https://www.dpunkt.de/buecher/13208/9783864906107-data-science.html.
The code you find below can be used to recreate all figures and analyses from this book chapter. Because the content is exclusively for the book, my descriptions around the code had to be minimal. But I’m sure, you can get the gist, even without the book. 😉

Click through for the code.  This is using the venerable AT&T customer churn data set.

Comments closed

Donating To The R Foundation

Mark Niemann-Ross explains how you can donate to the R Foundation:

I benefit from the work of the R Foundation. They oversee the language, but also encourage a healthy ecosystem. CRAN happens because of them. Updates to R happen because of them. useR! happens because of them. Every day, you and I are the recipients of some part of their time.
The least we can do is show them some appreciation. If you point your web browser at https://www.r-project.org/foundation/donations.html you’ll find a convenient (and surprisingly inexpensive) place to express your appreciation. As an individual, you can send these kind folks twenty-five euros to tell them you’re in favor of what they do.

But be sure to read the whole thing, especially if you are an American who wants the donation to be tax-deductible.  I believe that earmarking in this case is adding special instructions on SIAA’s PayPal page.

Comments closed

Timing Means Of Groups With R

John Mount shares some performance measures pitting data.table against various dplyr methods for calculating grouped means:

In this reproduction attempt we see:
– The dplyr time being around 0.05 seconds. This is about 5 times slower than claimed.
– The dplyr sum()/n() time is about 0.2 seconds, about 5 times faster than claimed.
– The data.table time being around 0.004 seconds. This is about three times as fast as the dplyr claims, and over ten times as fast as the actual observed dplyr behavior.

Read the whole thing.  If you want to replicate it yourself, check out the RMarkdown file.

Comments closed

ggmap Tutorial

Laura Ellis has an updated ggmap tutorial:

For those of you who have been following along with issue #51 in the ggmap repo, you’ll notice that there have been a number of changes in the Google Maps Static API service. Unfortunately these have caused some breakage in previous ggmap functionality.
If you used this package prior to July 2018, you may were likely able to do so without signing up for the Google Static Map API service yourself. As indicated on the the ggmap github repo – “Google has recently changed its API requirements, and ggmap users are now required to provide an API key and enable billing.  The billing enablement especially is a bit of a downer, but you can use the free tier without incurring charges. Also, the service being exposed through an easy to use r package that extends ggplot2 is pretty great so I’ll allow it.

This recent API change hurts.  But click through for the tutorial, which doesn’t hurt.

Comments closed

Reporting On Unit Tests In R With covrpage

Maelle Salmon recaps Locke Data’s involvement with the covrpage package:

To read more about getting started with covrpage in your own package in a few lines of code only, we recommend checking out the “get started” vignette. It explains more how to setup the Travis deploy, mentions which functions power the covrpage report, and gives more motivation for using covrpage.
And to learn how the information provided by covrpage should be read, read the “How to read the covrpage report” vignette.

Check it out.

Comments closed

The Intuition Behind Principal Component Analysis

Holger von Jouanne-Diedrich gives us an intuition behind how principal component analysis (PCA) works:


Principal component analysis (PCA) is a dimension-reduction method that can be used to reduce a large set of (often correlated) variables into a smaller set of (uncorrelated) variables, called principal components, which still contain most of the information.
PCA is a concept that is traditionally hard to grasp so instead of giving you the n’th mathematical derivation I will provide you with some intuition.
Basically PCA is nothing else but a projection of some higher dimensional object into a lower dimension. What sounds complicated is really something we encounter every day: when we watch TV we see a 2D-projection of 3D-objects!

Click through for the rest of the story.

Comments closed