Taking Advantage Of Vectorization In R

Kevin Feasel



John Mount explains, using Conway’s Game of Life, the importance of using vectors in R over scalars:

R is an interpreted programming language with vectorized data structures. This means a single R command can ask for very many arithmetic operations to be performed. This also means R computation can be fast. We will show an example of this using Conway’s Game of Life.

Conway’s Game of Life is one of the most interesting examples of cellular automata. It is traditionally simulated on a rectangular grid (like a chessboard) and each cell is considered either live or dead. The rules of evolution are simple: the next life grid is computed as follows:

  • To compute the state of a cell on the next grid sum the number of live cells in the eight neighboring cells on the current grid.

  • If this sum is 3 or if the current cell is live and the sum is 2 or 3, then the cell in the next grid will be live.

Not only is the R code faster, but it’s also terser.

Visualizing A Correlation Matrix With corrplot

Kristian Larsen demonstrates the corrplot package in R:

First we need to read the packages into the R library. For descriptive statistics of the dataset we use the skimr package and for visualization of correlation matrix we use the corrplot package. We will work with windspeed dataset from the bReeze package:

# Read packages into R library

Click through for the demo.

Getting The Right R Version For Packages

Kevin Feasel



Colin Gillespie shows a couple methods for figuring out the minimum version of R needed for a set of packages:

In R, there is a handy function called available.packages() that returns a matrix of details corresponding to packages currently available at one or more repositories. Unfortunately, the format isn’t initially amenable to manipulation. For example, consider the readr package

readr_desc = available.packages() %>% as_tibble() %>% filter(Package == "readr")

I immediately converted the data to a tibble, as that

  • changed the rownames to a proper column

  • changed the matrix to a data frame/tibble, which made selecting easier

There’s a good use of R functionality to delve into package requirements, as well as a script to try it out yourself.

Packages For Testing R Packages

Kevin Feasel


R, Testing

Maelle Salmon shows us how to test our R packages within R:

If you’re brand-new to unit testing your R package, I’d recommend reading this chapter from Hadley Wickham’s book about R packages.

There’s an R package called RUnit for unit testing, but in the whole post we’ll mention resources around the testthat package since it’s the one we use in our packages, and arguably the most popular one. testthat is great! Don’t hesitate to reads its docs again if you started using it a while ago, since the latest major release added the setup() and teardown() functions to run code before and after all tests, very handy.

To setup testing in an existing package i.e. creating the test folder and adding testthat as a dependency, run usethis::use_testthat(). In our WIP pRojects package, we set up the tests directory for you so you don’t forget. Then, in any case, add new tests for a function using usethis::use_test().

The testthis package might help make your testing workflow even smoother. In particular, test_this() “reloads the package and runs tests associated with the currently open R script file.”, and there’s also a function for opening the test file associated with the current R script.

This is an area where I know I need to get better, and Maelle gives us a plethora of tooling for tests.

Reshaping Data Frames With tidyr

Kevin Feasel



Anisa Dhana shows off some of the data reshaping functionality available in the tidyr package:

As it is shown above, the variable agegp has 6 groups (i.e., 25-34, 35-44) which has different alcohol intake and smoking use combinations. I think it would be interesting to transform this dataset from long to wide and to create a column for each age group and show the respective cases. Let see how the dataset will look like.

dt %>% spread(agegp, ncases) %>% slice(1:5)

Click through for a few additional transformations.

Using cdata To Created Faceted Plots

Nina Zumel shows how to use the cdata package to create faceted ggplot2 plots:

First, load the packages and data:

iris <- data.frame(iris)

Now define the data-shaping transform, or control table. The control table is basically a picture that sketches out the final data shape that I want. I want to specify the x and y columns of the plot (call these the value columns of the data frame) and the column that I am faceting by (call this the key column of the data frame). And I also need to specify how the key and value columns relate to the existing columns of the original data frame.

Read on to see how you can use cdata to tie together different faceted plots.

Using wrapr For A Consistent Pipe With ggplot2

John Mount shows how you can use the wrapr pipe to perform data processing and building a ggplot2 visual:

Now we can run a single pipeline that combines data processing steps and ggplot plot construction.

data.frame(x = 1:20) %.>% mutate(., y = cos(3*x)) %.>% ggplot(., aes(x = x, y = y)) %.>% geom_point() %.>% geom_line() %.>% ggtitle("piped ggplot2")

Check it out.

Using R To Hit Azure ML From Power BI

Leila Etaati shows how you can use R to hit an Azure ML endpoint to populate a data set in Power BI:

You need to create a model in Azure ML Studio and create a web service for it.

The traditional example in Predict a passenger on Titanic ship is going to survived or not?

we have a dataset about passengers like their age, gender, and passenger class, then we are going to predict whether they are going to survive or not

Open Azure ML Studio and follow the steps to create a model for predicting this. Navigate to Azure ML Studio.

Then download the dataset for titanic from here

Click through for the step-by-step instructions.

Connecting To Elasticsearch With R

Jerod Johnson has a sample of connecting to Elasticsearch with R:

You will need the following information to connect to Elasticsearch as a JDBC data source:

  • Driver Class: Set this to cdata.jdbc.elasticsearch.ElasticsearchDriver.
  • Classpath: Set this to the location of the driver JAR. By default, this is the lib subfolder of the installation folder.

The DBI functions, such as dbConnect anddbSendQuery , provide a unified interface for writing data access code in R. Use the following line to initialize a DBI driver that can make JDBC requests to the CData JDBC Driver for Elasticsearch:

Read on for the full instructions.

Voice Control For Shiny Apps

Over at Jumping Rivers, an example of using a Javascript library to control a page using voice commands:

I have found that performance across all devices and browsers is definitely not equal. By far the best browser I have found for viewing the apps is Google Chrome. I have also tended to find that my Ubuntu machines don’t do as well as Microsoft machines in picking up words correctly. A chat I had with someone recently suggested this might be down to drivers under Ubuntu for the microphones but that is not my area of expertise. Voice recognition was also fine on both of my Blackberry phones (one running BB OS 10, the other running Android 7).

It is worth noting that this does require an internet connection to function, in Chrome the voice to text is performed in the cloud.

The other thing I have noticed is that annyang seems relatively sensitive to background noise. This isn’t so bad for functions called using specific phrases but does sometimes have a large effect on the multi-word splats. This is because the splats are greedy and the background noise makes the recognition engine think that you are still talking long after you finished which gives the appearance of the application hanging.

The solution is by no means perfect, but it does look quite interesting.


December 2018
« Nov