Press "Enter" to skip to content

Category: R

Setting Up Your Own R Package Repository

Steve Belcher explains how to configure a custom package repository in your environment:

One of the strengths of the R language is the thousands of third-party packages that have been made publicly available via CRAN, the Comprehensive R Archive Network. R includes several functions that make it easy to download and install these packages. However, in many enterprise environments, access to the Internet is limited or non-existent. In such environments, it is useful to create a local package repository that users can access from within the corporate firewall.

Your local repository may contain source packages, binary packages, or both. If at least some of your users will be working on Windows systems, you should include Windows binaries in your repository. Windows binaries are R-version-specific; if you are running R 3.3.3, you need Windows binaries built under R 3.3. These versioned binaries are available from CRAN and other public repositories. If at least some of your users will be working on Linux systems, you must include source packages in your repository.

There are some tools which help out with this, so read the whole thing.

Comments closed

R 4.0 Improvements: stopifnot()

Bob Rudis looks at one of the R 4.0 changes hidden in the changelog:

R 4.0.0 has been out for a while, now, and — apart from a case where merge() was slower than dirt — it’s been really stable for at least me (I use it daily on macOS, Linux, and Windows). Sure, it came with some headline-grabbing features/upgrades, but I’ve started looking at what other useful nuggets might be in the changelog and decided to blog them as I find them.

Today’s nugget is the venerable stopifnot() function which was significantly enhanced by this PR by Neil Fultz.

Read on for a quality of life improvement with error handling in R.

Comments closed

Text Mining and Sentiment Analysis in R

Sanil Mhatre walks us through a sentiment analysis scenario in R:

Sentiments can be classified as positive, neutral or negative. They can also be represented on a numeric scale, to better express the degree of positive or negative strength of the sentiment contained in a body of text.

This example uses the Syuzhet package for generating sentiment scores, which has four sentiment dictionaries and offers a method for accessing the sentiment extraction tool developed in the NLP group at Stanford. The get_sentiment function accepts two arguments: a character vector (of sentences or words) and a method. The selected method determines which of the four available sentiment extraction methods will be used. The four methods are syuzhet (this is the default), bingafinn and nrc. Each method uses a different scale and hence returns slightly different results. Please note the outcome of nrc method is more than just a numeric score, requires additional interpretations and is out of scope for this article. The descriptions of the get_sentiment function has been sourced from : https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html?

Comments closed

Accessing Azure Queue Storage from R

Hong Ooi announces a new package for R:

This post is to announce that the AzureQstor package is now on GitHub. AzureQstor provides an R interface to Azure queue storage, building on the facilities provided by AzureStor.

Queue Storage is a service for storing large numbers of messages, for example from automated sensors, that can be accessed remotely via authenticated calls using HTTP or HTTPS. A single queue message can be up to 64 KB in size, and a queue can contain millions of messages, up to the total capacity limit of a storage account. Queue storage is often used to create a backlog of work to process asynchronously.

Hong includes a couple of demos as well, so check it out.

Comments closed

Solving the Prisoner Coin Flipping Puzzle with R

David Robinson takes us through another problem-solving challenge:

You are locked in the dungeon of a faraway castle with three fellow prisoners (i.e., there are four prisoners in total), each in a separate cell with no means of communication. But it just so happens that all of you are logicians (of course)….

Each prisoner will be given a fair coin, which can either be fairly flipped one time or returned to the guards without being flipped. If all flipped coins come up heads, you will all be set free! But if any of the flipped coins comes up tails, or if no one chooses to flip a coin, you will all be doomed to spend the rest of your lives in the castle’s dungeon.

The only tools you and your fellow prisoners have to aid you are random number generators, which will give each prisoner a random number, uniformly and independently chosen between zero and one.

What are your chances of being released?

I’ll solve this with tidy simulation in R, in particular using one of my favorite functions, tidyr’s crossing(). In an appendix, I’ll show how to get a closed form solution for N = 4.

I’ve also posted a 30-minute screencast of how I first approached the simulation and visualization.

Click through for the solution and explanation.

Comments closed

R 4.0 Released

David Smith walks us through what’s new in R 4.0:

R 4.0.0 was released in source form on Friday, and binaries for Windows, Mac and Linux are available for download now.

As the version number bump suggests, this is a major update to R that makes some significant changes. Some of these changes — particularly the first one listed below — are likely to affect the results of R’s calculations, so I would not recommend running scripts written for prior versions of R without validating them first. In any case, you’ll need to reinstall any packages you were using for R 4.0.0. (You might find this R script useful for checking what packages you have installed for R 3.x.)

And I just got 3.6 into production yesterday. Them’s the breaks…

Comments closed

The Siren Song of High Accuracy

Holger von Jouanne-Diedrich notes that accuracy is not in itself necessarily a good thing for a machine learning model:

In one of my most popular posts So, what is AI really? I showed that Artificial Intelligence (AI) basically boils down to autonomously learned rules, i.e. conditional statements or simply, conditionals.

In this post, I create the simplest possible classifier, called ZeroR, to show that even this classifier can achieve surprisingly high values for accuracy (i.e. the ratio of correctly predicted instances)… and why this is not necessarily a good thing, so read on!

The nuanced answer here is that with classifiers, accuracy is not in itself a great measure in the case of class imbalance. The more balanced your classes are, the more likely it is that a model with high accuracy is a good model. That’s where other measures such as specificity and sensitivity, positive & negative predictive value, etc. come into play.

Comments closed

Useful R Packages for Data Scientists

Tomaz Kastrun has a nice collection of useful R packages for data scientists:

Among thousand of R packages available on CRAN (with all the  mirror sites) or Github and any developer’s repository.

Many useful functions are available in many different R packages, many of the same functionalities also in different packages, so it all boils down to user preferences and work, that one decides to use particular package. From the perspective of a statistician and data scientist, I will cover the essential and major packages in sections. And by no means, this is not a definite list, and only a personal preference.

Click through for Tomaz’s recommendations.

Comments closed

R Checkpoint Package Update Now in Beta

Hong Ooi announces that a revamp of the checkpoint package is now in beta:

Checkpoint has been around for nearly 6 years now, helping R users solve the reproducible research puzzle. In that time, it’s seen many changes, new features, and, inevitably, bug reports. Some of these bugs have been fixed, while others remain outstanding in the too-hard basket.

Many of these issues spring from the fact that it uses only base R functions, in particular install.packages, to do its work. The problem is that install.packages is meant for interactive use, and as an API, is very limited. For starters, it doesn’t return a result to the caller—instead, checkpoint has to capture and parse the printed output to determine whether the installation succeeded. This causes a host of problems, since the printout will vary based on how R is configured. Similarly, install.packages refuses to install a package if it’s in use, which means checkpoint must unload it first—an imperfect and error-prone process at best.

In addition to these, checkpoint’s age means that it has accumulated a significant amount of technical debt over the years. For example, there is still code to handle ancient versions of R that couldn’t use HTTPS, even though the MRAN site (in line with security best practice) now accepts HTTPS connections only.

Click through to see what’s in the new checkpoint package.

Comments closed