Category: R

Graphics In R

Published 2018-06-28 by Kevin Feasel

David Smith is following the kerfuffle that Edward Tufte unleashed on Twitter recently:

While graphics guru Edward Tufte recently claimed that “R coders and users just can’t do words on graphics and typography” and need additonal tools to make graphics that aren’t “clunky”, data journalists at major publications beg to differ. The BBC has been creating graphics “purely in R” for some time, with a typography style matching that of the BBC website. Senior BBC Data Journalist Christine Jeavans offers several examples, including this chart of life expectancy differences between men and women:

I think Tufte’s off base here.

Comments closed

Counting Arguments In R

Published 2018-06-27 by Kevin Feasel

Neil Saunders shares methods for interrogating argument lists in R:

“Some R functions have an awful lot of arguments”, you think to yourself. “I wonder which has the most?”

It’s not an original thought: the same question as applied to the R base package is an exercise in the Functions chapter of the excellent Advanced R. Much of the information in this post came from there.

There are lots of R packages. We’ll limit ourselves to those packages which ship with R, and which load on startup. Which ones are they?

It’s a fun exercise and helpful for learning a bit more about how to work with arguments when metaprogramming in R.

Comments closed

Analyzing Federal Reserve Data With Ordinary Least Squares

Published 2018-06-27 by Kevin Feasel

Sam Shum has a tutorial walking us through extracting and analyzing data from the St. Louis Federal Reserve’s FRED economic database:

Download specific macroeconomic data from FRED St. Louis economic databases and ETL the data. Many other data series can be found at the FRED’s website.

# get unemployment data time series from FRED St. Louis
dfunrate <- get_fred_series("UNRATE", "unrate", observation_start = startdate, observation_end = enddate)

# get University of Michigan consumer sentiment index data time series from FRED St. Louis
dfumcsent <- get_fred_series("UMCSENT", "umcsent", observation_start = startdate, observation_end = enddate)

# combine the two time series data into one data frame
dfall <- cbind(dfunrate,dfumcsent)

# strip or remove redundant month field from data downloaded from FRED St. Louis
dfall <- dfall[,c(1,2,4)]

# obtain the number of data points in the dataframe
mdx <- (1:nrow(dfall))  

# convert FRED date field from string to R's date type
dfall$date <- as.Date(dfall$date)

There’s a nice chart builder on the FRED website too, but it’s good to be able to grab the data on your own.

Comments closed

Reproducable Examples In R

Published 2018-06-27 by Kevin Feasel

Mara Averick shows us an example of a reproducable example in R, useful when reporting errors:

In honour of the triumphant return of reprex to CRAN, let’s revisit what I refer to as Jenny Bryan’s keys to reprex-cellence. The three keys are as follows:

code that actually runs
code that I don’t have to run
code that can be easily run

Very useful if you want to get help on a problem.

Comments closed

Converting Factors To Numbers In R

Published 2018-06-26 by Kevin Feasel

Sebastian Sauer shows us a pitfall of brute-force conversion of factors to integers:

Oh no! That’s not what we wanted! R has messed the thing up (?). The reason is that R sees the first factor level internally as the number 1 . The second level as number two. What’s the first factor level in our case? Let’s see:
factor(tips$sex) %>% head()
#> [1] Female Male   Male   Male   Female Male  
#> Levels: Female Male
factor(tips$sex_r) %>% head()
#> [1] 1 0 0 0 1 0
#> Levels: 0 1
That’s confusing: “0” is the first level of sex_r – internally for R represented by “1”. The second level of sex_r is “1” – internally represented by “2”.

Fortunately, we get the easy answer at the end of the post.

Comments closed

Parallelizing Linear Regression With MapReduce

Published 2018-06-25 by Kevin Feasel

Arthur Charpentier shows us the math behind using MapReduce to parallelize a linear regression:

Sometimes, with big data, matrices are too big to handle, and it is possible to use tricks to numerically still do the map. Map-Reduce is one of those. With several cores, it is possible to split the problem, to map on each machine, and then to aggregate it back at the end.

Arthur gives us an interesting example in R to boot.

Comments closed

Granting Non-Admin Users Access To Run ML Services

Published 2018-06-25 by Kevin Feasel

Niels Berglund walks through the rights needed for a non-administrative user to execute an external script using SQL Server Machine Learning Services:

Oops, something did go wrong, as it turns out that if you try to grant permissions on extended stored procedures, which SPEES is, you need to do it from the master database. Cool, let us switch to master and do it there. Well, if you try to do that – then you get another error: the user does not exist in master, sigh!

At this stage you have a couple of options:

~~Add the login for the user to the sysadmin role, or the user to the db_owner role in the actual database.~~ No do not do that, I am only kidding! Do.Not.Do.That!
Create the user in master and grant the permission. That would work.
Grant the permission to public.

Check it out, as there are two parts to the process.

Comments closed

Using DALEX To Explain Black-Box Models

Published 2018-06-20 by Kevin Feasel

Przemyslaw Biecek explains that there’s more than LIME for explaining black-box models:

I’ve heard about a number of consulting companies, that decided to use simple linear model instead of a black box model with higher performance, because ,,client wants to understand factors that drive the prediction’’.
And usually the discussion goes as following: ,,We have tried LIME for our black-box model, it is great, but it is not working in our case’’, ,,Have you tried other explainers?’’, ,,What other explainers’’?

So here you have a map of different visual explanations for black-box models.

Check out DALEX, which includes a Jupyter notebook example. H/T R-Bloggers

Comments closed

Comparing Keras In Python Versus R

Published 2018-06-20 by Kevin Feasel

Dmitry Kisler performs image classification using Keras in both Python and R:

From the plots above, one can see that:

the accuracy of your model doesn’t depend on the language you use to build and train it (the plot shows only train accuracy, but the model doesn’t have high variance and the bias accuracy is around 99% as well).
even though 10 measurements may be not convincing, but Python would reduce (by up to 15%) the time required to train your CNN model. This is somewhat expected because R uses Python under the hood when executes Keras functions.

This is just one example, but the results are about what I’d expect.

Comments closed

The Dangers Of The Ellipsis In R

Published 2018-06-19 by Kevin Feasel

John Mount shows us an example where ... (the ellipsis) can come back to hurt us:

The following code example contains an easy error in using the Rfunction unique().
vec1 <- c("a", "b", "c")
vec2 <- c("c", "d")
unique(vec1, vec2)
# [1] "a" "b" "c"
Notice none of the novel values from vec2 are present in the result. Our mistake was: we (improperly) tried to use unique() with multiple value arguments, as one would use union(). Also notice no error or warning was signaled. We used unique() incorrectly and nothing pointed this out to us. What compounded our error was R‘s “...” function signature feature.

John makes it clear that ... is not itself a bad thing, just that there is a time and a place for it and misusing it can lead to hard-to-understand bugs.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31