Press "Enter" to skip to content

Category: R

Deploying An R Service To Azure Kubernetes Service

Hong Ooi shows us how we can use Azure Container Registry and Azure Kubernetes Service to deploy an R model via Plumber:

If you run this code, you should see a lot of output indicating that R is downloading, compiling and installing randomForest, and finally that the image is being pushed to Azure. (You will see this output even if your machine already has the randomForest package installed. This is because the package is being installed to the R session inside the container, which is distinct from the one running the code shown here.)
All docker calls in AzureContainers, like the one to build the image, return the actual docker commandline as the cmdline attribute of the (invisible) returned value. In this case, the commandline is docker build -t bos_rf . Similarly, the push() method actually involves two Docker calls, one to retag the image, and the second to do the actual pushing; the returned value in this case will be a 2-component list with the command lines being docker tag bos_rf deployreg.azurecr.io/bos_rf and docker push deployreg.azurecr.io/bos_rf.

I love this confluence of technologies and at the same time get a “descent into madness” feeling from the sheer number of worlds colliding.

Comments closed

Road Construction Incentive Contracts And R

Sebastian Kranz promotes an interesting RTutor project:

Patrick Bajari and Gregory Lewis have collected a detailed sample of 466 road construction projects in Minnesota to study this question in their very interesting article Moral Hazard, Incentive Contracts and Risk: Evidence from Procurement in the Review of Economic Studies, 2014.
They estimate a structural econometric model and find that changes in contract design could substantially reduce the duration of road blockages and largely increase total welfare at only minor increases in the risk that road construction firms face.
As part of his Master Thesis at Ulm University, Claudius Schmid has generated a nice and detailed RTutor problem set that allows you to replicate the findings in an interactive fashion. You learn a lot about the structure and outcomes of the currently used contracts, the theory behind better contract design and how the structural model to assess the quantitative effects can be estimated and simulated. At the same time, you can hone your general data science and R skills.

Click through to a couple of ways to get to this RTutor project and learn a bit about building incentive contracts to modify behavior.  H/T R-Bloggers

Comments closed

Analyzing Customer Churn With Keras And H2O

Shirin Glander has released code pertaining to a forthcoming book chapter:

This is code that accompanies a book chapter on customer churn that I have written for the German dpunkt Verlag. The book is in German and will probably appear in February: https://www.dpunkt.de/buecher/13208/9783864906107-data-science.html.
The code you find below can be used to recreate all figures and analyses from this book chapter. Because the content is exclusively for the book, my descriptions around the code had to be minimal. But I’m sure, you can get the gist, even without the book. 😉

Click through for the code.  This is using the venerable AT&T customer churn data set.

Comments closed

Donating To The R Foundation

Mark Niemann-Ross explains how you can donate to the R Foundation:

I benefit from the work of the R Foundation. They oversee the language, but also encourage a healthy ecosystem. CRAN happens because of them. Updates to R happen because of them. useR! happens because of them. Every day, you and I are the recipients of some part of their time.
The least we can do is show them some appreciation. If you point your web browser at https://www.r-project.org/foundation/donations.html you’ll find a convenient (and surprisingly inexpensive) place to express your appreciation. As an individual, you can send these kind folks twenty-five euros to tell them you’re in favor of what they do.

But be sure to read the whole thing, especially if you are an American who wants the donation to be tax-deductible.  I believe that earmarking in this case is adding special instructions on SIAA’s PayPal page.

Comments closed

Timing Means Of Groups With R

John Mount shares some performance measures pitting data.table against various dplyr methods for calculating grouped means:

In this reproduction attempt we see:
– The dplyr time being around 0.05 seconds. This is about 5 times slower than claimed.
– The dplyr sum()/n() time is about 0.2 seconds, about 5 times faster than claimed.
– The data.table time being around 0.004 seconds. This is about three times as fast as the dplyr claims, and over ten times as fast as the actual observed dplyr behavior.

Read the whole thing.  If you want to replicate it yourself, check out the RMarkdown file.

Comments closed

ggmap Tutorial

Laura Ellis has an updated ggmap tutorial:

For those of you who have been following along with issue #51 in the ggmap repo, you’ll notice that there have been a number of changes in the Google Maps Static API service. Unfortunately these have caused some breakage in previous ggmap functionality.
If you used this package prior to July 2018, you may were likely able to do so without signing up for the Google Static Map API service yourself. As indicated on the the ggmap github repo – “Google has recently changed its API requirements, and ggmap users are now required to provide an API key and enable billing.  The billing enablement especially is a bit of a downer, but you can use the free tier without incurring charges. Also, the service being exposed through an easy to use r package that extends ggplot2 is pretty great so I’ll allow it.

This recent API change hurts.  But click through for the tutorial, which doesn’t hurt.

Comments closed

Reporting On Unit Tests In R With covrpage

Maelle Salmon recaps Locke Data’s involvement with the covrpage package:

To read more about getting started with covrpage in your own package in a few lines of code only, we recommend checking out the “get started” vignette. It explains more how to setup the Travis deploy, mentions which functions power the covrpage report, and gives more motivation for using covrpage.
And to learn how the information provided by covrpage should be read, read the “How to read the covrpage report” vignette.

Check it out.

Comments closed

The Intuition Behind Principal Component Analysis

Holger von Jouanne-Diedrich gives us an intuition behind how principal component analysis (PCA) works:


Principal component analysis (PCA) is a dimension-reduction method that can be used to reduce a large set of (often correlated) variables into a smaller set of (uncorrelated) variables, called principal components, which still contain most of the information.
PCA is a concept that is traditionally hard to grasp so instead of giving you the n’th mathematical derivation I will provide you with some intuition.
Basically PCA is nothing else but a projection of some higher dimensional object into a lower dimension. What sounds complicated is really something we encounter every day: when we watch TV we see a 2D-projection of 3D-objects!

Click through for the rest of the story.

Comments closed

Working With Missing Values In R

Anisa Dhana has a few examples of ways we can work with data containing missing values in R:

Imputation is a complex process that requires a good knowledge of your data. For example, it is crucial to know whether the missing is at random or not before you impute the data. I have read a nice tutorial which visualize the missing data and help to understand the type of missing, and another post showing how to impute the data with MICE package.

In this short post, I will focus on management of the missing data using the tidyverse package. Specifically, I will show how to manage missings in the long data format (i.e., more than one observation for id).

Anisa shows a few different techniques, depending upon what you need to do with the data.  I’d caution about using mean in the second example and instead typically prefer median, as replacing missing values with the median won’t alter the distribution in the way that it can with mean.

Comments closed