Press "Enter" to skip to content

Category: R

Reviewing the Stack Overflow Developer Survey

Michael Toth looks at the recently-released 2019 Stack Overflow Developer Survey:

Since 2011, Stack Overflow has been surveying their users each year to answer questions about the technologies they use, their work experience, their compensation, and their satisfaction at work. Given Stack Overflow’s place in the broader programming world, they are able to draw quite the audience for their annual surveys.

This year, nearly 90,000 developers participated in the survey! There’s a lot in this survey, and I recommend reviewing it yourself, but I wanted to surface some of the key findings that I thought were particularly relevant to data professionals here.

Stack Overflow says they will be releasing the underlying data for this survey in the coming weeks, so I hope to return to this for a deeper analysis once that’s made available. For now, let’s get into the results!

Michael’s lede involves R versus Python in terms of salaries, but for me, the top line is that functional programmers make more money. Clojure, F#, Scala, Elixir, and Erlang make the top 10 on the global list, including positions 1, 2, 4, and 5. Within the US, Scala, Clojure, Erlang, Kotlin, F#, and Elixir make the top 10, including positions 1, 2, and 4. H/T R-Bloggers

Comments closed

Dates in Base R

Michael Toth explains some of the functionality available in base R (that is, not packages like lubridate) for working with dates:

When working with R date formats, you’re generally going to be trying to accomplish one of two different but related goals:

1. Converting a character string like “Jan 30 1989” to a Date type

2. Getting an R Date object to print in a specific format for a graph or other output

You may need to handle both of these goals in the same analysis, but it’s best to think of them as two separate exercises. Knowing which goal you are trying to accomplish is important because you will need to use different functions to accomplish each of these. Let’s tackle them one at a time.

There are some good insights in the post. H/T R-bloggers

Comments closed

Finding an Unfair Coin with R

Sebastian Sauer works out a coin flip problem:

A stochastic problem, with application to financial theory. Some say it goes back to Warren Buffett. I relied to my colleague Norman Markgraf, who pointed it out to me.

Assume there are two coins. One is fair, one is loaded. The loaded coin has a bias of 60-40. Now, the question is: How many coin flips do you need to be “sure enough” (say, 95%) that you found the loaded coin?

Let’s simulate la chose.

It took a few more flips than I had expected but the number is not outlandish.

Comments closed

Standard and Non-Standard Evaluation in R

John Mount explains Standard Evaluation versus Non-Standard Evaluation in R:

In standard (or value oriented evaluation) code you type in is taken to be variable names, functions, names, operators, and even numeric literal values. String values or literals need extra marks, such as quotes.

John walks us through several examples along the way. At the end, John is a major proponent of Standard Evaluation over Non-Standard Evaluation.

Comments closed

Compacting R Libraries

Dirk Eddelbuettel shows how you can save a lot of space by stripping excess information from R packages:

Back in August of 2017, we wrote two posts #9: Compating your Share Libraries and #10: Compacting your Shared Libraries, After The Buildabout “stripping” shared libraries. This involves removing auxiliary information (such as debug symbols and more) from the shared libraries which can greatly reduce the installed size (on suitable platforms – it mostly matters where I work, i.e. on Linux).

There’s a pretty good space savings in the tidyverse package. H/T R-Bloggers.

Comments closed

Residual Analysis with R

Abhijit Telang shares a few techniques for doing post-regression residual analysis using R:

Naturally, I would expect my model to be unbiased, at least in intention, and hence any leftovers on either side of the regression line that did not make it on the line are expected to be random, i.e. without any particular pattern.

That is, I expect my residual error distributions to follow a bland, normal distribution.

In R, you can do this elegantly with just two lines of code. 
1. Plot a histogram of residuals 
2. Add a Quantile-Quantile plot with a line that passes through, namely, the first and third quantiles.

There are several more techniques in here to analyze residuals, so check it out.

Comments closed

Interactive ggplot Plots with plotly

Laura Ellis takes us through ggplotly:

As someone very interested in storytelling, ggplot2 is easily my data visualization tool of choice. It is like the Swiss army knife for data visualization. One of my favorite features is the ability to pack a graph chock-full of dimensions. This ability is incredibly handy during the data exploration phases. However, sometimes I find myself wanting to look at trends without all the noise. Specifically, I often want to look at very dense scatterplots for outliers. Ggplot2 is great at this, but when we’ve isolated the points we want to understand, we can’t easily examine all possible dimensions right in the static charts.

Enter plotly. The plotly package and ggploty function do an excellent job at taking our high quality ggplot2 graphs and making them interactive.

Read on for several quality, interactive visuals.

Comments closed

Goodbye, gather and spread; Hello pivot_long and pivot_wide

John Mount covers a change in tidyr which mimics Mount and Nina Zumel’s pivot_to_rowrecs and unpivot_to_blocks functions in the cdata package:

If you want to work in the above way we suggest giving our cdatapackage a try. We named the functions pivot_to_rowrecs and unpivot_to_blocks. The idea was: by emphasizing the record structure one might eventually internalize what the transforms are doing. On the way to that we have a lot of documentation and tutorials.

This is your regular reminder that the Tidyverse is very useful, but it is not the entirety of R.

Comments closed