Press "Enter" to skip to content

Category: R

Custom Snippets In RStudio

Mara Averick shows how to create a custom code snippet in RStudio:

The RStudio Support Code Snippets post is a great step-by-step for adding snippets of your own. The gist of it for a markdown snippet is as follows:

  1. Open RStudio Preferences

  2. Go to the Code section

  3. Click the Edit Snippets button3

  4. Select Markdown

  5. Add your snippet and Save

Click through for a demo of how to embed tweets into your RMarkdown documentation.

Comments closed

Regular Expressions With R

Dave Mason looks at using SQL Server R Services to execute regular expressions against a T-SQL data set:

Have you ever had the need to use Regular Expressions directly in SQL Server? I sometimes hear or see others refer to using RegEx in TSQL. But I always assume they’re talking about the TSQL LIKE operator, because RegEx isn’t natively supported. In TSQL’s defence, you can get a lot of mileage out of LIKE and some clever pattern matching strings, even though it’s not authentic RegEx. You can leverage RegEx libraries in the .NET Framework via a CLR stored procedure. You should also be able to do something similar with an old-school extended stored procedure.

I discussed all of this during a recent interview. It was a day or two afterwards (of course) when it dawned on me that there’s another way to leverage RegEx from TSQL: the R language. Prior to this mini-revelation, I had always thought of R (and Python) as strictly a means to an end for Data Science and related disciplines. Now I am thinking I’ve been looking at R and Python through too narrow of a lens and I should take a larger view.

I think I’d prefer CLR for this because there’s additional overhead to making R Services calls, but it’s a clever use of R Services.

Comments closed

SQL Server R Services Troubleshooting

Ginger Grant walks us through a few troubleshooting tactics with SQL Server 2016 R Services:

Much to my surprise after this I received an error

Msg 39019, Level 16, State 1, Line 1
An external script error occurred:
Unable to launch the runtime. ErrorCode 0x80070490: 1168(Element not found.).
Msg 11536, Level 16, State 1, Line 1
EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.

I looked in the log files and didn’t find any errors.  I checked the configuration manager to ensure that I had some user ids configured in the configuration manager.  Nothing seemed to make any difference.  Looking online, the only error that I saw which might possibly be close was a different error message about 8.3 naming and the working directory.

This service is somewhat finicky to set up in my experience, though once you have it configured, it tends to be pretty stable.

Comments closed

Recognizing Wood Knot Images

Bob Horton and Vanja Paunic walk through a lumber grading scenario with Microsoft R Server:

Here we use the rxFeaturize function from Microsoft R Server, which allows us to perform a number of transformations on the knot images in order to produce numerical features. We first resize the images to fit the dimensions required by the pre-trained deep neural model we will use, then extract the pixels to form a numerical data set, then run that data set through a DNN pre-trained model. The result of the image featurization is a numeric vector (“feature vector”) that represents key characteristics of that image.

Image featurization here is accomplished by using a deep neural network (DNN) model that has already been pre-trained by using millions of images. Currently, MRS supports four types of DNNs – three ResNet models (18, 50, 101)[1] and AlexNet [8].

This is a practical example of how to use image recognition to facilitate machine learning.

Comments closed

A Cheat Sheet For Purrr

Mara Averick has a list of resources for learning more about purrr:

Purrr royal decree (ok, I’ll stop with the 🐱  puns now), the purrr 📦  now has its very own official RStudio cheat sheetApply Functions Cheat Sheet

The purrr package makes it easy to work with lists and functions. This cheatsheet will remind you how to manipulate lists with purrr as well as how to apply functions iteratively to each element of a list or vector. The back of the cheatsheet explains how to work with list-columns. With list columns, you can use a simple data frame to organize any collection of objects in R.

So, I thought we’d celebrate with a bit of a purrr 🐦  tweet roundup:

Purrr is one of those libraries I know I need to learn more about, and this looks like a good starting point.

Comments closed

Visualizing Networks With R

Arthur Charpentier shows off some of the functionality of igraph:

The good thing is that a lot of functions are available. For instance, we can get shortest paths between two specific nodes. And we can give appropriate colors to the nodes that we’ll cross:

> AP=all_shortest_paths(iflo,
+ from=”Peruzzi”,
+ to=”Ginori”)
> L=AP$res[[1]]
> V(iflo)$color=”yellow”
> V(iflo)$color[L[2:4]]=”light blue”
> V(iflo)$color[L[c(1,5)]]=”blue”
> plot(iflo)

Click through for a demo-heavy example.

Comments closed

Machine Learning Services Updates

Umachandar Jayachandran and team have been busy.  First, they announced a preview of SQL Server ML Services in Azure SQL Database:

In-database Machine Learning support was added in SQL Server 2016 and we are now bringing the same functionality to Azure SQL Database. You can now train and score machine learning models in Azure SQL Database and the predictions can be exposed to any application using your database, easily and seamlessly.

The preview functionality allows you to train and score machine learning models using data that fits in memory (in R data frame). Please note that the amount of memory available for R scripts execution depends on the edition of the Azure SQL database and cannot be modified.

No Python support there yet, but it’s upcoming.  Second, we can use the PREDICT function in Azure SQL Database:

Today we are announcing the general availability of the native PREDICT Transact-SQL function in Azure SQL Database. The PREDICT function allows you to perform scoring in real-time using certain RevoScaleR or revoscalepy models in a SQL query without invoking the R or Python runtime.

The PREDICT function support was added in SQL Server 2017. It is a table-valued function that takes a RevoScaleR or revoscalepy model & data (in the form of a table or view or query) as inputs and generates predictions based on the machine learning model. More details of the PREDICT function can be found here.

There are a limited number of models which support PREDICT—things like linear and logistic regression, RevoScaleR’s fast decision trees, etc.  If you have this type of model, however, the predictions stay within SQL Server and end up being much faster than going out to R.

Comments closed

dplyr Mutate Quirks

John Mount explains a quirk in dplyr’s mutate function:

It is hard for experts to understand how frustrating the above is to a new R user or to a part time R user. It feels like any variation on the original code causes it to fail. None of the rules they have been taught anticipate this, or tell them how to get out of this situation.

This quickly leads to strong feelings of learned helplessness and anxiety.

Our rule for dplyr::mutate() has been for some time:

Each column name used in a single mutate must appear only on the left-hand-side of a single assignment, or otherwise on the right-hand-side of any number of assignments (but never both sides, even if it is different assignments).

If you do data analysis with R, you’ve probably run into this before.  I certainly have, and it’s nice to understand why this is the case.

Comments closed

Regular Expression Cheat Sheets

Mara Averick shows off a collection of regular expression guides:

There are helpful string-related R packages 📦, stringr (which is built on top of the more comprehensive stringi package) comes to mind. But, at some point in your computing life, you’re gonna need to get down with regular expressions.

And so, here’s a collection of some of the Regex-related links I’ve tweeted 🐦:

Click through for links to regular expression resources.

Comments closed

Visualizing A Single Number

Tim Bock shows a dozen methods for visualizing a single number:

There are a number of situations in which it can be advantageous to create a visualization to represent a single number:

  • To communicate with less numerate viewers/readers;

  • Infographics and dashboards commonly use one important number;

  • To attract the attention of distracted or busy viewers/readers;

  • To add some humanity or “color”, to create an emotional connection;

  • Or to increase the redundancy of the presentation (see Improve the Quality of Data Visualizations Using Redundancy).

To a great extent, my favorite is the first.  There are good cases for many of the others—primarily the shock value of the uncountable pictogram—but typically, the best visualization is simple.

Comments closed