R is an interpreted programming language with vectorized data structures. This means a single R command can ask for very many arithmetic operations to be performed. This also means R computation can be fast. We will show an example of this using Conway’s Game of Life.
Conway’s Game of Life is one of the most interesting examples of cellular automata. It is traditionally simulated on a rectangular grid (like a chessboard) and each cell is considered either live or dead. The rules of evolution are simple: the next life grid is computed as follows:
To compute the state of a cell on the next grid sum the number of live cells in the eight neighboring cells on the current grid.
If this sum is 3 or if the current cell is live and the sum is 2 or 3, then the cell in the next grid will be live.
Not only is the R code faster, but it’s also terser.
First we need to read the packages into the R library. For descriptive statistics of the dataset we use the
skimrpackage and for visualization of correlation matrix we use the
corrplotpackage. We will work with windspeed dataset from the
bReezepackage:# Read packages into R library library(bReeze) library(corrplot) library(skimr)
Click through for the demo.
In R, there is a handy function called
available.packages()that returns a matrix of details corresponding to packages currently available at one or more repositories. Unfortunately, the format isn’t initially amenable to manipulation. For example, consider the readr package
readr_desc = available.packages() %>% as_tibble() %>% filter(Package == "readr")
I immediately converted the data to a tibble, as that
changed the rownames to a proper column
changed the matrix to a data frame/tibble, which made selecting easier
There’s a good use of R functionality to delve into package requirements, as well as a script to try it out yourself.
If you’re brand-new to unit testing your R package, I’d recommend reading this chapter from Hadley Wickham’s book about R packages.
There’s an R package called
RUnitfor unit testing, but in the whole post we’ll mention resources around the
testthatpackage since it’s the one we use in our packages, and arguably the most popular one.
testthatis great! Don’t hesitate to reads its docs again if you started using it a while ago, since the latest major release added the
teardown()functions to run code before and after all tests, very handy.
To setup testing in an existing package i.e. creating the test folder and adding
testthatas a dependency, run
usethis::use_testthat(). In our WIP
pRojectspackage, we set up the tests directory for you so you don’t forget. Then, in any case, add new tests for a function using
testthispackage might help make your testing workflow even smoother. In particular,
test_this()“reloads the package and runs tests associated with the currently open R script file.”, and there’s also a function for opening the test file associated with the current R script.
This is an area where I know I need to get better, and Maelle gives us a plethora of tooling for tests.
As it is shown above, the variable
agegphas 6 groups (i.e., 25-34, 35-44) which has different alcohol intake and smoking use combinations. I think it would be interesting to transform this dataset from long to wide and to create a column for each age group and show the respective cases. Let see how the dataset will look like.
dt %>% spread(agegp, ncases) %>% slice(1:5)
Click through for a few additional transformations.
First, load the packages and data:
library("ggplot2")library("cdata") iris <- data.frame(iris)
Now define the data-shaping transform, or control table. The control table is basically a picture that sketches out the final data shape that I want. I want to specify the
ycolumns of the plot (call these the value columns of the data frame) and the column that I am faceting by (call this the key column of the data frame). And I also need to specify how the key and value columns relate to the existing columns of the original data frame.
Read on to see how you can use
cdata to tie together different faceted plots.
Now we can run a single pipeline that combines data processing steps and
data.frame(x = 1:20) %.>% mutate(., y = cos(3*x)) %.>% ggplot(., aes(x = x, y = y)) %.>% geom_point() %.>% geom_line() %.>% ggtitle("piped ggplot2")
Check it out.
You need to create a model in Azure ML Studio and create a web service for it.
The traditional example in Predict a passenger on Titanic ship is going to survived or not?
we have a dataset about passengers like their age, gender, and passenger class, then we are going to predict whether they are going to survive or not
Open Azure ML Studio and follow the steps to create a model for predicting this. Navigate to Azure ML Studio.
Then download the dataset for titanic from here
Click through for the step-by-step instructions.
You will need the following information to connect to Elasticsearch as a JDBC data source:
- Driver Class: Set this to
- Classpath: Set this to the location of the driver JAR. By default, this is the lib subfolder of the installation folder.
The DBI functions, such as
dbSendQuery, provide a unified interface for writing data access code in R. Use the following line to initialize a DBI driver that can make JDBC requests to the CData JDBC Driver for Elasticsearch:
Read on for the full instructions.
I have found that performance across all devices and browsers is definitely not equal. By far the best browser I have found for viewing the apps is Google Chrome. I have also tended to find that my Ubuntu machines don’t do as well as Microsoft machines in picking up words correctly. A chat I had with someone recently suggested this might be down to drivers under Ubuntu for the microphones but that is not my area of expertise. Voice recognition was also fine on both of my Blackberry phones (one running BB OS 10, the other running Android 7).
It is worth noting that this does require an internet connection to function, in Chrome the voice to text is performed in the cloud.
The other thing I have noticed is that annyang seems relatively sensitive to background noise. This isn’t so bad for functions called using specific phrases but does sometimes have a large effect on the multi-word splats. This is because the splats are greedy and the background noise makes the recognition engine think that you are still talking long after you finished which gives the appearance of the application hanging.
The solution is by no means perfect, but it does look quite interesting.