New RTVS Instructions

Kevin Feasel



Ginger Grant has updated her instructions for installing R Tools for Visual Studio and getting R Services to work on SQL Server:

In addition to having an SQL Server 2016 instance with R Server installed, the following components are needed on a client machine

The Comprehensive R Archive Network

RStudio (optional)

Visual Studio 2015 R Tools

This list is a change from the previous list I have provided as RTVS contains an installation of R Client, there is no need to download that as well. You do not need to download Microsoft R Open if you are using R Server either.  Once RTVS is installed, there is a menu option on the R Tools window. Selecting Install R Client from the menu will handle the information. Unfortunately, there is no change to the menu option once R Client is installed, it always looks like you should install it.  To find out if R Client has been installed, look in the Workspaces window.

In other words, fewer dependencies and an easier installation process.  Read the whole thing to avoid RevoScaleR errors in your code post-upgrade.

Generating R Services Stored Procedures From R

Kevin Feasel



David Smith describes sqlrutils, an R function to generate SQL Server R Services stored procedures:

If you’ve created an R function (say, a routine to clean up missing values in a data set, or a function to make forecasts using a machine learning model), and you want to make it easy for DBAs to use it, it’s now possible to publish R functions as a SQL Server 2016 stored procedure. The sqlrutils package provides tools to convert an existing R function to a stored procedure which can then be executed by anyone with authenticated access to the database — even if they don’t know any R.

To use an R function as a stored procedure, you’ll need SQL Server 2016 with R Services installed. You’ll also need to use the sqlrutils package to publish the function as a stored procedure: it’s included with both Microsoft R Client (available free) and Microsoft R Server (included with SQL Server 2016), version 9.0 or later.

Compare this against R Tools for Visual Studio, with which you can generate stored procedures from the IDE.

ggedit 0.2.0

Jonathan Sidi announces ggedit 0.2.0:

ggedit is an R package that is used to facilitate ggplot formatting. With ggedit, R users of all experience levels can easily move from creating ggplots to refining aesthetic details, all while maintaining portability for further reproducible research and collaboration.
ggedit is run from an R console or as a reactive object in any Shiny application. The user inputs a ggplot object or a list of objects. The application populates Bootstrap modals with all of the elements found in each layer, scale, and theme of the ggplot objects. The user can then edit these elements and interact with the plot as changes occur. During editing, a comparison of the script is logged, which can be directly copied and shared. The application output is a nested list containing the edited layers, scales, and themes in both object and script form, so you can apply the edited objects independent of the original plot using regular ggplot2 grammar.

This makes modifying ggplot2 visuals a lot easier for people who aren’t familiar with the concept of aesthetics and layers—like, say, the marketing team or management.

Scalable Data Analytics

Kevin Feasel


Cloud, R

David Smith covers a recent Microsoft Data Science team talk at Strata:

The tutorial covers many different techniques for training predictive models at scale, and deploying the trained models as predictive engines within production environments. Among the technologies you’ll use are Microsoft R Server running on Spark, the SparkR package, the sparklyr package and H20 (via the rsparkling package). It also touches on some non-Spark methods, like the bigmemory and ff packages for R (and various other packages that make use of them), and using the foreach package for coarse-grained parallel computations. You’ll also learn how to create prediction engines from these trained models using the mrsdeploy package.

Check out the post as well as the tutorial David links.


Kevin Feasel


Cloud, R

JS Tan announces a new R package:

For users of the R language, scaling up their work to take advantage of cloud-based computing has generally been a complex undertaking. We are therefore excited to announce doAzureParallel, a lightweight R package built on Azure Batch that allows you to easily use Azure’s flexible compute resources right from your R session. The doAzureParallel package complements Microsoft R Server and provides the infrastructure you need to run massively parallel simulations on Azure directly from R.

The doAzureParallel package is a parallel backend for the popular foreach package, making it possible to execute multiple processes across a cluster of Azure virtual machines with just a few lines of R code. The package helps you create and manage the cluster in Azure, and register it as a parallel backend to be used with foreach.

It’s an interesting alternative to building beefy R servers.

Linear Support Vector Machines

Ananda Das explains how linear Support Vector Machines work in classifying spam messages:

Linear SVM assumes that the two classes are linearly separable that is a hyper-plane can separate out the two classes and the data points from the two classes do not get mixed up. Of course this is not an ideal assumption and how we will discuss it later how linear SVM works out the case of non-linear separability. But for a reader with some experience here I pose a question which is like this Linear SVM creates a discriminant function but so does LDA. Yet, both are different classifiers. Why ? (Hint: LDA is based on Bayes Theorem while Linear SVM is based on the concept of margin. In case of LDA, one has to make an assumption on the distribution of the data per class. For a newbie, please ignore the question. We will discuss this point in details in some other post.)

This is a pretty math-heavy post, so get your coffee first. h/t R-Bloggers.

Get RTVS To Stop Opening Notepad

Kevin Feasel



Sarah Dutkiewicz figures out how to get R Tools for Visual Studio to stop having R files open in Notepad:

As I have been going through my courses – which use swirl() – I have been looking at how things work, comparing RStudio to RTVS.  One of the things that was maddening for me was going through one of the courses in RTVS and having R files open in Notepad.  Notepad?!?  RStudio wasn’t doing this, so I was even more frustrated.  I could also open R files with Visual Studio right from the file system, so the file association was already in place.  This didn’t make sense.  However… RTVS is an open source project, as is swirl().  So I spent tonight looking at code in GitHub.

Read on for the answer.

Using mrsdeploy To Run R On Azure

Kevin Feasel


Cloud, R

John-Mark Agosta shows how to use mrsdeploy to send R batch jobs up to an Azure VM:

Alternately there are other Azure platforms for operationalization using R Server in the Marketplace, with other operating systems and platforms including HDInsight, Microsoft’s Hadoop offering. Or, equivalently one could use the Data Science VM available in the Marketplace, since it has a copy of R Server installed. Configuration of these platforms is similar to the example covered in this posting.

Provisioning an R Server VM, as reference in the documentation, takes a few steps that are detailed here, which consist of configuring the VM and setting up the server account to authorize remote access. To set up the server you’ll use the system account you set up as a user of the Linux machine. The server account is used for client interaction with the R Server, and should not be confused with the Linux system account. This is a major difference with the Windows version of the R Server VM that uses Active Directory services for authentication.

You can also use mrsdeploy to run batch jobs against Microsoft R Server on a local Hadoop cluster.

The Tidyverse Curse

Kevin Feasel



Bob Muenchen notes a structural conflict between R and its most common set of packages:

There’s a common theme in many of the sections above: a task that is hard to perform using base a R function is made much easier by a function in the dplyr package. That package, and its relatives, are collectively known as the tidyverse. Its functions help with many tasks, such as selecting, renaming, or transforming variables, filtering or sorting observations, combining data frames, and doing by-group analyses. dplyr is such a helpful package that shows that it is the single most popular R package (as of 3/23/2017.) As much of a blessing as these commands are, they’re also a curse to beginners as they’re more to learn. The main packages of dplyr, tibble, tidyr, and purrr contain a few hundred functions, though I use “only” around 60 of them regularly. As people learn R, they often comment that base R functions and tidyverse ones feel like two separate languages. The tidyverse functions are often the easiest to use, but not always; its pipe operator is usually simpler to use, but not always; tibbles are usually accepted by non-tidyverse functions, but not always; grouped tibbles may help do what you want automatically, but not always (i.e. you may need to ungroup or group_by higher levels). Navigating the balance between base R and the tidyverse is a challenge to learn.

Interesting read.  As Bob notes in the comments, he’s still a fan of the tidyverse, but it’s important to recognize that there are pain points there.

Writing A Better Abstract

Kevin Feasel



Adnan Fiaz reviews conference abstracts for patterns:

Certainly an interesting graph! It may have been better to show the proportions instead of counts as the number of abstracts in each category are not equal. Nevertheless, the conclusion remains the same. The words “r” and “data” are clearly the most common. However, what is more interesting is that abstracts in the “yes” category use certain words significantly more often than abstracts in the “no” category and vice versa (more often because a missing bar doesn’t necessarily mean a zero observation). For example, the words “science”, “production” and “performance” occur more often in the “yes” category. Vice versa, the words “tools”, “product”, “package” and “company(ies)” occur more often in the “no” category. Also, the word “application” occurs in its singular form in the “no” category and in its plural form in the “yes” category. Certainly, at EARL we like our applications to be plural, it is in the name after all.

Granted, this is only abstracts for one conference, but it’s an interesting idea.


April 2017
« Mar