Category: R

Factors In R

Published 2018-08-31 by Kevin Feasel

Dave Mason continues his look at R, this time covering the concept of factors:

Factor data can be nominal or ordinal. In our examples so far, it is nominal. “C”, “G”, and “F” (and “Center”, “Guard”, and “Forward” for that matter) are names that have no comparative order to each other. It’s not meaningful to say a Center is greater than a Forward or a Forward is less than a Guard (keep in mind these are position names–don’t let height cloud your thinking). If we try making a comparison, we get a warning message:
> position_factor[1] > position_factor[2]
[1] NA
Warning message:
In Ops.factor(position_factor[1], position_factor[2]) :
  ‘>’ not meaningful for factors
Ordinal data, on the other hand, can be compared to each other in some ranked fashion–it has order. Take bed sizes, for instance. A “Twin” bed is smaller than a “Full”, which is smaller than a “Queen”, which is smaller than a “King”. To create a factor with ordered (ranked) levels, use the ordered parameter, which is a logical flag to indicate if the levels should be regarded as ordered (in the order given).

Check it out.

Comments closed

Examples Of Charts In Different Languages

Published 2018-08-29 by Kevin Feasel

David Smith points out a great repository of information on generating different types of charts in different libraries:

The visualization tools include applications like Excel, Power BI and Tableau; languages and libraries including R, Stata, and Python’s matplotlib); and frameworks like D3. The data visualizations range from the standard to the esoteric, and follow the taxonomy of the book Data Visualisation (also by Andy Kirk). The chart categories are color coded by row: categorical (including bar charts, dot plots); hierarchical (donut charts, treemaps); relational (scatterplots, sankey diagrams); temporal (line charts, stream graphs) and spatial (choropleths, cartograms).

Check out the Chartmaker Directory.

Comments closed

More On Radix Sorting In R

Published 2018-08-27 by Kevin Feasel

Inaki Ucar explains some of the nuance behind sorting in R:

The latest R tip in Win-Vector Blog encourages you to Use Radix Sort based on a simple benchmark showing a x35 speedup compared to the default method, but with no further explanation. In my opinion, though, the complete tip would be, instead, use radix sort… if you know what you are doing, because a quick benchmark shouldn’t spare you the effort of actually reading the docs. And here is a spoiler: you are already using it.

One may wonder why R’s default sorting algorithm is so bad, and why was even chosen. The thing is that there is a trick here, and to understand it, first we must understand the benchmark’s data and then read the docs.

Read the whole thing.

Comments closed

Your R Code Should Be In Source Control Too

Published 2018-08-27 by Kevin Feasel

Lindsay Carr explains the importance of storing your R code in source control:

But wait, I would need to learn an additional tool?

Yes, but don’t panic! Git is a tool with various commands that you can use to help track your changes. Luckily, you don’t need to know too many commands in Git to use the basic functionality. As an added bonus, using Git with RStudio takes away some of the burden of knowing Git commands by including buttons for common actions.

As with any tool that you pick up to help your scientific workflows, there is some upfront work before you can start seeing the benefits. Don’t let that deter you. Git can be very easy once you get the gist. Think about the benefits of being able to track changes: you can make some changes, have a record of that change and who made it, and you can tie that change to a specific problem that was reported or feature request that was noted.

It’s still code, and you gain a lot by keeping code in source control.

Comments closed

Using The glue Package In R

Published 2018-08-24 by Kevin Feasel

Evgeni Chasnovski shows the glue package and also works around some trickiness with NULL:

Recently, fate lead me to try using {glue} in a package. I was very pleased to how it makes code more readable, which I believe is a very important during package development. However, I stumbled upon this pretty unexpected behavior:
y <- NULL
paste("I have", x, "apples and", y, "oranges.")
## [1] "I have 10 apples and  oranges."
str(glue("I have {x} apples and {y} oranges."))
## Classes 'glue', 'character'  chr(0)
If one of the expressions is evaluated into NULL then the output becomes empty string.

glue reminds me of string formatting in .NET languages. On the whole, that’s a good thing.

Comments closed

Solving Linear Optimization Problems In R

Published 2018-08-23 by Kevin Feasel

Mic walks us through a linear optimization problem and solves it with the lpSolve package:

I’m going to implement in R an example of linear optimization that I found in the book “Modeling and Solving Linear Programming with R” by Jose M. Sallan, Oriol Lordan and Vincenc Fernandez. The example is named “Production of two models of chairs” and can be found at page 57, section 3.5. I’m going to solve only the first point.

The problem text is the following

A company produces two models of chairs: 4P and 3P. The model 4P needs 4 legs, 1 seat and 1 back. On the other hand, the model 3P needs 3 legs and 1 seat. The company has a initial stock of 200 legs, 500 seats and 100 backs. If the company needs more legs, seats and backs, it can buy standard wood blocks, whose cost is 80 euro per block. The company can produce 10 seats, 20 legs and 2 backs from a standard wood block. The cost of producing the model 4P is 30 euro/chair, meanwhile the cost of the model 3P is 40 euro/chair. Finally, the company informs that the minimum number of chairs to produce is 1000 units per month. Define a linear programming model, which minimizes the total cost (the production costs of the two chairs, plus the buying of new wood blocks).

I remember solving this exact problem (down to the four legs versus three legs bit) in grad school. We used LINGO to do this, though I haven’t seen that language since. H/T R-Bloggers

Comments closed

The Luminance Illusion With gganimate

Published 2018-08-23 by Kevin Feasel

David Smith highlights an example of the luminance illusion:

Colin created this animation in R using the gganimate package (available on GitHub from author Thomas Lin Pederson), and the process is delightfully simple. It begins with a chart of 10 “points”, each being the same grey square equally spaced across the shaded background. Then, a simple command animates the transitions from one point to the next, and interpolates between them smoothly:
library(gganimate)
gg_animated <- gg + 
  transition_time(t) + 
  ease_aes('linear')

Check it out, both as a parlor trick and a way of getting a grip on the gganimate package.

Comments closed

Styling In ggplot2

Published 2018-08-23 by Kevin Feasel

The folks at Jumping Rivers show an example of creating a nice-looking plot with ggplot2:

The changes we’ve made so far would impossible for any package to do for us – how would the package know the plot title? We can now improve the look and feel of the plot. There are two ways of complementary ways of doing this: scales and themes. The ggplot scales control things like colours and point size. In the latest version of ggplot2, version 3.0.0, the Viridis colour palette was introduced. This palette is particularly useful for creating colour-blind friendly palettes
g + scale_colour_viridis_d() # d for discrete

With a few lines of code, those default graphs can look a lot nicer.

Comments closed

Matrix Math In R

Published 2018-08-22 by Kevin Feasel

Dave Mason continues his series on matrices in R:

Math operations between matrices is possible too. Here, the same matrix is added to itself. Since it’s the same matrix, they obviously have the same number of elements. The first element is added to the first element, the second element is added to the second element, etc.
> #Add two matrices.
> some_numbers + some_numbers
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    2    4    6    8   10   12
[2,]   14   16   18   20   22   24
[3,]   26   28   30   32   34   36
[4,]   38   40   42   44   46   48

This follows from Dave’s prior posts, but you can see some of the pieces start to fit together.

Comments closed

Dealing With Multicollinearity With R

Published 2018-08-21 by Kevin Feasel

Chaitanya Sagar explains the concept of multicollinearity in linear regressions and how we can mitigate this issue in R:

Perfect multicollinearity occurs when one independent variable is an exact linear combination of other variables. For example, you already have X and Y as independent variables and you add another variable, Z = a*X + b*Y, to the set of independent variables. Now, this new variable, Z, does not add any significant or different value than provided by X or Y. The model can adjust itself to set the parameters that this combination is taken care of while determining the coefficients.

Multicollinearity may arise from several factors. Inclusion or incorrect use of dummy variables in the system may lead to multicollinearity. The other reason could be the usage of derived variables, i.e., one variable is computed from other variables in the system. This is similar to the example we took at the beginning of the article. The other reason could be taking variables which are similar in nature or which provide similar information or the variables which have very high correlation among each other.

Multicollinearity can make regression analysis trickier, and it’s worth knowing about. H/T R-bloggers.

Comments closed