Category: R

This works, and the paste() pattern is so useful we suggest researching and memorizing it.

However the “call” portion of the model is reported as “formula = f” (the name of the variable carrying the formula) instead of something more detailed. Frankly this printing issue never bothered us. None of our tools or workflows currently use the model call item, and for a very large number of variables formatting the call contents in the model report becomes unweildy. We also already have the formula in a variable, so if we need it we can save it or pass it along.

There is a much better place on many models to get model structure information from than the model call item: the model terms item. This item carries a lot of information and formats up quite nicely:
format(terms(model))
# [1] "mpg ~ cyl + disp + hp + carb"

Be sure to check out the comments too, as there are several solutions to this problem.

Comments closed

Lists In R

Published 2018-09-05 by Kevin Feasel

Dave Mason continues his process of learning about R data structures with a survey of lists:

In previous lessons, we’ve noted vectors and matrices consist of data elements of the same class. R will coerce data elements to a single class if we attempt to create a vector or matrix with data elements of differing classes. Lists, on the other hand, can hold data elements of different classes, such as the integer, character, or logical class. In fact, a list can hold most anything in R, including vectors, matrices, and many more! None to my surprise, lists can be created with the list() function:

And if you want to work with lists, purrr is a great package to learn.

Comments closed

Values Belong In Columns

Published 2018-09-04 by Kevin Feasel

John Mount argues that to reduce ambiguity, ensure that your values are columns on appropriate data frames:

Here is an (artificial) example.
chamber_sizes <- mtcars$disp/mtcars$cyl
form <- hp ~ chamber_sizes
model <- lm(form, data = mtcars)
print(model)
# Call:
# lm(formula = form, data = mtcars)
#
# Coefficients:
#   (Intercept)  chamber_sizes  
#         2.937          4.104  
Notice: one of the variables came from a vector in the environment, not from the primary data.frame. chamber_sizes was first looked for in the data.frame, and then in the environment the formula was defined (which happens to be the global environment), and (if that hadn’t worked) in the executing environment (which is again the global environment).

Our advice is: do not do that. Place all of your values in columns. Make it unambiguous all variables are names of columns in your data.frame of interest. This allows you to write simple code that works over explicit data. The style we recommend looks like the following.

Read the whole thing.

Comments closed

Using R With Excel

Published 2018-09-04 by Kevin Feasel

David Smith walks us through various ways to integrate R and Excel:

If you’re familiar with analyzing data in Excel and want to learn how to work with the same data in R, Alyssa Columbus has put together a very useful guide: How To Use R With Excel. In addition to providing you with a guide for installing and setting up R and the RStudio IDE, it provide a wealth of useful tips for working with Excel data in R, including:

To import Excel data into R, use the readxl package
To export Excel data from R, use the openxlsx package
How to remove symbols like “$” and “%” from currency and percentage columns in Excel, and convert them to numeric variables suitable for analysis in R
How to do computations on variables in R, and a list of common Excel functions (like RAND and VLOOKUP) with their R equivalents
How to emulate common Excel chart types (like histograms and line plots) using R plotting functions

David also shows how to run R within Excel. One of the big benefits of readxl is that it doesn’t require Java; most other Excel readers do.

Comments closed

Explaining Keras Models With LIME

Published 2018-08-31 by Kevin Feasel

Shirin Glander shares her slide deck on explaining Keras image classification models with LIME:

Here I am sharing the slides for a webinar I gave for SAP about Explaining Keras Image Classification Models with LIME.

Slides can be found here: https://www.slideshare.net/ShirinGlander/sap-webinar-explaining-keras-image-classification-models-with-lime

Read on for links to additional resources as well.

Comments closed

Factors In R

Published 2018-08-31 by Kevin Feasel

Dave Mason continues his look at R, this time covering the concept of factors:

Factor data can be nominal or ordinal. In our examples so far, it is nominal. “C”, “G”, and “F” (and “Center”, “Guard”, and “Forward” for that matter) are names that have no comparative order to each other. It’s not meaningful to say a Center is greater than a Forward or a Forward is less than a Guard (keep in mind these are position names–don’t let height cloud your thinking). If we try making a comparison, we get a warning message:
> position_factor[1] > position_factor[2]
[1] NA
Warning message:
In Ops.factor(position_factor[1], position_factor[2]) :
  ‘>’ not meaningful for factors
Ordinal data, on the other hand, can be compared to each other in some ranked fashion–it has order. Take bed sizes, for instance. A “Twin” bed is smaller than a “Full”, which is smaller than a “Queen”, which is smaller than a “King”. To create a factor with ordered (ranked) levels, use the ordered parameter, which is a logical flag to indicate if the levels should be regarded as ordered (in the order given).

Check it out.

Comments closed

Examples Of Charts In Different Languages

Published 2018-08-29 by Kevin Feasel

David Smith points out a great repository of information on generating different types of charts in different libraries:

The visualization tools include applications like Excel, Power BI and Tableau; languages and libraries including R, Stata, and Python’s matplotlib); and frameworks like D3. The data visualizations range from the standard to the esoteric, and follow the taxonomy of the book Data Visualisation (also by Andy Kirk). The chart categories are color coded by row: categorical (including bar charts, dot plots); hierarchical (donut charts, treemaps); relational (scatterplots, sankey diagrams); temporal (line charts, stream graphs) and spatial (choropleths, cartograms).

Check out the Chartmaker Directory.

Comments closed

More On Radix Sorting In R

Published 2018-08-27 by Kevin Feasel

Inaki Ucar explains some of the nuance behind sorting in R:

The latest R tip in Win-Vector Blog encourages you to Use Radix Sort based on a simple benchmark showing a x35 speedup compared to the default method, but with no further explanation. In my opinion, though, the complete tip would be, instead, use radix sort… if you know what you are doing, because a quick benchmark shouldn’t spare you the effort of actually reading the docs. And here is a spoiler: you are already using it.

One may wonder why R’s default sorting algorithm is so bad, and why was even chosen. The thing is that there is a trick here, and to understand it, first we must understand the benchmark’s data and then read the docs.

Read the whole thing.

Comments closed

Your R Code Should Be In Source Control Too

Published 2018-08-27 by Kevin Feasel

Lindsay Carr explains the importance of storing your R code in source control:

But wait, I would need to learn an additional tool?

Yes, but don’t panic! Git is a tool with various commands that you can use to help track your changes. Luckily, you don’t need to know too many commands in Git to use the basic functionality. As an added bonus, using Git with RStudio takes away some of the burden of knowing Git commands by including buttons for common actions.

As with any tool that you pick up to help your scientific workflows, there is some upfront work before you can start seeing the benefits. Don’t let that deter you. Git can be very easy once you get the gist. Think about the benefits of being able to track changes: you can make some changes, have a record of that change and who made it, and you can tie that change to a specific problem that was reported or feature request that was noted.

It’s still code, and you gain a lot by keeping code in source control.

Comments closed

Using The glue Package In R

Published 2018-08-24 by Kevin Feasel

Evgeni Chasnovski shows the glue package and also works around some trickiness with NULL:

Recently, fate lead me to try using {glue} in a package. I was very pleased to how it makes code more readable, which I believe is a very important during package development. However, I stumbled upon this pretty unexpected behavior:
y <- NULL
paste("I have", x, "apples and", y, "oranges.")
## [1] "I have 10 apples and  oranges."
str(glue("I have {x} apples and {y} oranges."))
## Classes 'glue', 'character'  chr(0)
If one of the expressions is evaluated into NULL then the output becomes empty string.

glue reminds me of string formatting in .NET languages. On the whole, that’s a good thing.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28