Category: R

Recently, fate lead me to try using {glue} in a package. I was very pleased to how it makes code more readable, which I believe is a very important during package development. However, I stumbled upon this pretty unexpected behavior:
y <- NULL
paste("I have", x, "apples and", y, "oranges.")
## [1] "I have 10 apples and  oranges."
str(glue("I have {x} apples and {y} oranges."))
## Classes 'glue', 'character'  chr(0)
If one of the expressions is evaluated into NULL then the output becomes empty string.

glue reminds me of string formatting in .NET languages. On the whole, that’s a good thing.

Comments closed

Solving Linear Optimization Problems In R

Published 2018-08-23 by Kevin Feasel

Mic walks us through a linear optimization problem and solves it with the lpSolve package:

I’m going to implement in R an example of linear optimization that I found in the book “Modeling and Solving Linear Programming with R” by Jose M. Sallan, Oriol Lordan and Vincenc Fernandez. The example is named “Production of two models of chairs” and can be found at page 57, section 3.5. I’m going to solve only the first point.

The problem text is the following

A company produces two models of chairs: 4P and 3P. The model 4P needs 4 legs, 1 seat and 1 back. On the other hand, the model 3P needs 3 legs and 1 seat. The company has a initial stock of 200 legs, 500 seats and 100 backs. If the company needs more legs, seats and backs, it can buy standard wood blocks, whose cost is 80 euro per block. The company can produce 10 seats, 20 legs and 2 backs from a standard wood block. The cost of producing the model 4P is 30 euro/chair, meanwhile the cost of the model 3P is 40 euro/chair. Finally, the company informs that the minimum number of chairs to produce is 1000 units per month. Define a linear programming model, which minimizes the total cost (the production costs of the two chairs, plus the buying of new wood blocks).

I remember solving this exact problem (down to the four legs versus three legs bit) in grad school. We used LINGO to do this, though I haven’t seen that language since. H/T R-Bloggers

Comments closed

The Luminance Illusion With gganimate

Published 2018-08-23 by Kevin Feasel

David Smith highlights an example of the luminance illusion:

Colin created this animation in R using the gganimate package (available on GitHub from author Thomas Lin Pederson), and the process is delightfully simple. It begins with a chart of 10 “points”, each being the same grey square equally spaced across the shaded background. Then, a simple command animates the transitions from one point to the next, and interpolates between them smoothly:
library(gganimate)
gg_animated <- gg + 
  transition_time(t) + 
  ease_aes('linear')

Check it out, both as a parlor trick and a way of getting a grip on the gganimate package.

Comments closed

Styling In ggplot2

Published 2018-08-23 by Kevin Feasel

The folks at Jumping Rivers show an example of creating a nice-looking plot with ggplot2:

The changes we’ve made so far would impossible for any package to do for us – how would the package know the plot title? We can now improve the look and feel of the plot. There are two ways of complementary ways of doing this: scales and themes. The ggplot scales control things like colours and point size. In the latest version of ggplot2, version 3.0.0, the Viridis colour palette was introduced. This palette is particularly useful for creating colour-blind friendly palettes
g + scale_colour_viridis_d() # d for discrete

With a few lines of code, those default graphs can look a lot nicer.

Comments closed

Matrix Math In R

Published 2018-08-22 by Kevin Feasel

Dave Mason continues his series on matrices in R:

Math operations between matrices is possible too. Here, the same matrix is added to itself. Since it’s the same matrix, they obviously have the same number of elements. The first element is added to the first element, the second element is added to the second element, etc.
> #Add two matrices.
> some_numbers + some_numbers
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    2    4    6    8   10   12
[2,]   14   16   18   20   22   24
[3,]   26   28   30   32   34   36
[4,]   38   40   42   44   46   48

This follows from Dave’s prior posts, but you can see some of the pieces start to fit together.

Comments closed

Dealing With Multicollinearity With R

Published 2018-08-21 by Kevin Feasel

Chaitanya Sagar explains the concept of multicollinearity in linear regressions and how we can mitigate this issue in R:

Perfect multicollinearity occurs when one independent variable is an exact linear combination of other variables. For example, you already have X and Y as independent variables and you add another variable, Z = a*X + b*Y, to the set of independent variables. Now, this new variable, Z, does not add any significant or different value than provided by X or Y. The model can adjust itself to set the parameters that this combination is taken care of while determining the coefficients.

Multicollinearity may arise from several factors. Inclusion or incorrect use of dummy variables in the system may lead to multicollinearity. The other reason could be the usage of derived variables, i.e., one variable is computed from other variables in the system. This is similar to the example we took at the beginning of the article. The other reason could be taking variables which are similar in nature or which provide similar information or the variables which have very high correlation among each other.

Multicollinearity can make regression analysis trickier, and it’s worth knowing about. H/T R-bloggers.

Comments closed

Including R Visuals In Power BI Dashboards

Published 2018-08-21 by Kevin Feasel

Parker Stevens shows how to include R visuals in a Power BI dashboard:

Let’s finish up this post with a quick example of how to code the elusive line chart with two y-axes. This always seems to be asked in the forums and it’s pretty easy to implement.

Follow the same steps as shown above to bring in a new R visual. Since we need a column to pass into the visual and open up the editor, let’s just throw in the Angle field that we made previously. With the code editor available we can start writing the R script. In this example, we are going to need some data that is available in a specific R package, called “ggplot2.” Go ahead and install the package by typing the following code the same way we installed scatterplot3d:

install.packages(“ggplot2”)

There are two interesting examples here, including one which accepts an external parameter.

Comments closed

Faster User-Defined Functions In SparkR

Published 2018-08-20 by Kevin Feasel

Liang Zhang and Hossein Falaki note a major performance improvement for functions in SparkR using the latest version of the Databricks Runtime:

SparkR offers four APIs that run a user-defined function in R to a SparkDataFrame

dapply()

dapplyCollect()

gapply()

gapplyCollect()

dapply() allows you to run an R function on each partition of the SparkDataFrame and returns the result as a new SparkDataFrame, on which you may apply other transformations or actions. gapply() allows you to apply a function to each grouped partition consisting of a key and the corresponding rows in a SparkDataFrame. dapplyCollect() and gapplyCollect()are shortcuts if you want to call collect() on the result.

The following diagram illustrates the serialization and deserialization performed during the execution of the UDF. The data gets serialized twice and deserialized twice in total, all of which are row-wise.

By vectorizing data serialization and deserialization in Databricks Runtime 4.3, we encode and decode all the values of a column at once. This eliminates the primary bottleneck which row-wise serialization, and significantly improves SparkR’s UDF performance. Also, the benefit from the vectorization is more drastic for larger datasets.

It looks like they get some pretty serious gains from this change.

Comments closed

Subsetting Matrices In R

Published 2018-08-20 by Kevin Feasel

Dave Mason continues his look at matrices in R:

We can extract an entire row from a matrix. To do this, specify the desired row only within the square brackets [ ]. The placeholder where you would otherwise specify the column is left empty.
> #Points scored by Kendrick Perkins.
> points_scored_by_quarter[1,]
1st 2nd 3rd 4th 
  2   2   6   0 
> points_scored_by_quarter["Perkins",]
1st 2nd 3rd 4th 
  2   2   6   0 
Conversely, we can extract a column from a matrix. Specify the column within the square brackets [ ]and omit the row. The result is a vector, thus the pivot effect–the row names are displayed in the output (not the column name).

Dave points out that working with matrices is basically an extension of working with vectors.

Comments closed

Microsoft R Open 3.5.1

Published 2018-08-16 by Kevin Feasel

David Smith announces Microsoft R Open 3.5.1:

Microsoft R Open 3.5.1 has been released, combining the latest R language engine with multi-processor performance and tools for managing R packages reproducibly. You can download Microsoft R Open 3.5.1 for Windows, Mac and Linux from MRAN now. Microsoft R Open is 100% compatible with all R scripts and packages, and works with all your favorite R interfaces and development environments.

This update brings a number of minor fixes to the R language engine from the R core team. It also makes available a host of new R packages contributed by the community, including packages for downloading financial data, connecting with analytics systems, applying machine learning algorithms and statistical models, and many more. New R packages are released every day, and you can access packages released after the 1 August 2018 CRAN snapshot used by MRO 3.5.1 using the checkpoint package.

Read on for more and check out the updates.

Comments closed