Press "Enter" to skip to content

Category: R

Get RTVS To Stop Opening Notepad

Sarah Dutkiewicz figures out how to get R Tools for Visual Studio to stop having R files open in Notepad:

As I have been going through my courses – which use swirl() – I have been looking at how things work, comparing RStudio to RTVS.  One of the things that was maddening for me was going through one of the courses in RTVS and having R files open in Notepad.  Notepad?!?  RStudio wasn’t doing this, so I was even more frustrated.  I could also open R files with Visual Studio right from the file system, so the file association was already in place.  This didn’t make sense.  However… RTVS is an open source project, as is swirl().  So I spent tonight looking at code in GitHub.

Read on for the answer.

Comments closed

Using mrsdeploy To Run R On Azure

John-Mark Agosta shows how to use mrsdeploy to send R batch jobs up to an Azure VM:

Alternately there are other Azure platforms for operationalization using R Server in the Marketplace, with other operating systems and platforms including HDInsight, Microsoft’s Hadoop offering. Or, equivalently one could use the Data Science VM available in the Marketplace, since it has a copy of R Server installed. Configuration of these platforms is similar to the example covered in this posting.

Provisioning an R Server VM, as reference in the documentation, takes a few steps that are detailed here, which consist of configuring the VM and setting up the server account to authorize remote access. To set up the server you’ll use the system account you set up as a user of the Linux machine. The server account is used for client interaction with the R Server, and should not be confused with the Linux system account. This is a major difference with the Windows version of the R Server VM that uses Active Directory services for authentication.

You can also use mrsdeploy to run batch jobs against Microsoft R Server on a local Hadoop cluster.

Comments closed

The Tidyverse Curse

Bob Muenchen notes a structural conflict between R and its most common set of packages:

There’s a common theme in many of the sections above: a task that is hard to perform using base a R function is made much easier by a function in the dplyr package. That package, and its relatives, are collectively known as the tidyverse. Its functions help with many tasks, such as selecting, renaming, or transforming variables, filtering or sorting observations, combining data frames, and doing by-group analyses. dplyr is such a helpful package that Rdocumentation.org shows that it is the single most popular R package (as of 3/23/2017.) As much of a blessing as these commands are, they’re also a curse to beginners as they’re more to learn. The main packages of dplyr, tibble, tidyr, and purrr contain a few hundred functions, though I use “only” around 60 of them regularly. As people learn R, they often comment that base R functions and tidyverse ones feel like two separate languages. The tidyverse functions are often the easiest to use, but not always; its pipe operator is usually simpler to use, but not always; tibbles are usually accepted by non-tidyverse functions, but not always; grouped tibbles may help do what you want automatically, but not always (i.e. you may need to ungroup or group_by higher levels). Navigating the balance between base R and the tidyverse is a challenge to learn.

Interesting read.  As Bob notes in the comments, he’s still a fan of the tidyverse, but it’s important to recognize that there are pain points there.

Comments closed

Writing A Better Abstract

Adnan Fiaz reviews conference abstracts for patterns:

Certainly an interesting graph! It may have been better to show the proportions instead of counts as the number of abstracts in each category are not equal. Nevertheless, the conclusion remains the same. The words “r” and “data” are clearly the most common. However, what is more interesting is that abstracts in the “yes” category use certain words significantly more often than abstracts in the “no” category and vice versa (more often because a missing bar doesn’t necessarily mean a zero observation). For example, the words “science”, “production” and “performance” occur more often in the “yes” category. Vice versa, the words “tools”, “product”, “package” and “company(ies)” occur more often in the “no” category. Also, the word “application” occurs in its singular form in the “no” category and in its plural form in the “yes” category. Certainly, at EARL we like our applications to be plural, it is in the name after all.

Granted, this is only abstracts for one conference, but it’s an interesting idea.

Comments closed

RTVS 1.0

Shahrokh Mortazavi announces that R Tools for Visual Studio 1.0 is officially out:

RTVS builds on Visual Studio, which means you get numerous features for free: from using multiple languages to word-class Editing and Debugging to over 7,000 extensions for every need:

  • A polyglot IDE – VS supports R, Python, C++, C#, Node.js, SQL, etc. projects simultaneously.

  • Editor – complete editing experience for R scripts and functions, including detachable/tabbed windows, syntax highlighting, and much more.

  • IntelliSense – (aka auto-completion) available in both the editor and the Interactive R window.

  • R Interactive Window – work with the R console directly from within Visual Studio.

  • History window – view, search, select previous commands and send to the Interactive window.

  • Variable Explorer – drill into your R data structures and examine their values.

  • Plotting – see all of your R plots in a Visual Studio tool window.

  • Debugging – breakpoints, stepping, watch windows, call stacks and more.

  • R Markdown – R Markdown/knitr support with export to Word and HTML.

  • Git – source code control via Git and GitHub.

  • Extensions – over 7,000 Extensions covering a wide spectrum from Data to Languages to Productivity.

  • Help – use ? and ?? to view R documentation within Visual Studio.

I’ve been using it for a little while and it’s pretty snazzy for integrating with SQL Server R Services.  R Studio is still more feature-rich, but RTVS is definitely catching up.

Comments closed

Datashader

John Mount is a bit jazzed when it comes to a new package:

I recently got back from Strata West 2017 (where I ran a very well received workshop on R and Spark). One thing that really stood out for me at the exhibition hall was Bokeh plus datashader from Continuum Analytics.

I had the privilege of having Peter Wang himself demonstrate datashaderfor me and answer a few of my questions.

I am so excited about datashader capabilities I literally will not wait for the functionality to be exposed in R through rbokeh. I am going to leave my usual knitr/rmarkdown world and dust off Jupyter Notebook just to use datashader plotting. This is worth trying, even for diehard R users.

For the moment, it looks like datashader is only available for Python, but it’s coming to R.

Comments closed

Visualizing Market Basket Analyses With Power BI

Leila Etaati explains how to use Power BI and a Force-Directed Graph custom visual to display results of a market basket analysis:

By clicking on the “R transformation” a new windows will show up. This windows is a R editor that you can past your code here. however there are couple of things that you should consider.

1. there is a error message handling but always recommended to run and be sure your code work in R studio first (in our example we already tested it in Part 1).

2. the all data is holding in variable “dataset”.

3. you do not need to write “install.packages” to get packages here, but you should first install required packages into your R editor and here just call “library(package name)”

Leila takes this step-by-step, leading to a Power BI visual with drill-down.

Comments closed

R On Athena

Gopal Wunnava shows how to run R scripts using Amazon Athena as a data source:

Next, you’ll practice interactively querying Athena from R for analytics and visualization. For this purpose, you’ll use GDELT, a publicly available dataset hosted on S3.

Create a table in Athena from R using the GDELT dataset. This step can also be performed from the AWS management console as illustrated in the blog post “Amazon Athena – Interactive SQL Queries for Data in Amazon S3.”

This is an interesting use case for Athena.

Comments closed

RevoScaleR With Power BI

Tomaz Kastrun looks at several methods for using RevoScaleR packages in a Power BI dashboard:

I was invited to deliver a session for Belgium User Group on SQL Server and R integration. After the session – which we did online using web based Citrix  – I got an interesting question: “Is it possible to use RevoScaleR performance computational functions within Power BI?“. My first answer was,  a sceptical yes. But I said, that I haven’t used it in this manner yet and that there might be some limitations.

The idea of having the scalable environment and the parallel computational package with all the predictive analytical functions in Power BI is absolutely great. But something tells me, that it will not be that straight forward.

Read on for the rest of the story.

Comments closed

Graphing R Package Dependencies

Tomaz Kastrun uses the igraph package to graph package dependencies in R:

With importing package tools, we get many useful functions to find additional information on packages.

Function package.dependencies() parses and check dependencies of a package in current environment. Function package_dependencies()  (with underscore and not dot) will find all dependent and reverse dependent packages.

This probably tilts more toward “fun” than “practical,” but this will let you see the full set of dependencies for a package if, for example, you need to grab all of these packages for upgrading an offline instance.

Comments closed