Press "Enter" to skip to content

Category: R

New Version of ggforce Available

Thomas Lin Pedersen announces a new version of ggforce for R:

If there is one thing of general utility lacking in ggplot2 it is probably the ability to annotate data cleanly. Sure, there’s geom_text()/geom_label()but using them requires a fair bit of fiddling to get the best placement and further, they are mainly relevant for labeling and not longer text. ggrepelhas improved immensely on the fiddling part, but the lack of support for longer text annotation as well as annotating whole areas is still an issue.

In order to at least partly address this, ggforce includes a family of geoms under the geom_mark_*() moniker. They all behaves equivalently except for how they encircle the given area(s). 

There are some really interesting features in the ggforce package, so check them out.

Comments closed

Parameters in Rmarkdown Files

Neil Saunders shows how you can parameterize Rmarkdown files, consequently making changes easy later:

The reports follow a common template where the major difference is simply the hashtag. So one way to create these reports is to use the previous one, edit to find/replace the old hashtag with the new one, and save a new file.

That works…but what if we could define the hashtag once, then reuse it programmatically anywhere in the document? Enter Rmarkdown parameters.

The example is small but important.

Comments closed

Multi-Server Real-Time Scoring with R

David Smith points out a reference architecture for real-time scoring with R distributed through Kubernetes:

Let’s say you’ve developed a predictive model in R, and you want to embed predictions (scores) from that model into another application (like a mobile or Web app, or some automated service). If you expect a heavy load of requests, R running on a single server isn’t going to cut it: you’ll need some kind of distributed architecture with enough servers to handle the volume of requests in real time.

This reference architecture for real-time scoring with R, published in Microsoft Docs, describes a Kubernetes-based system to distribute the load to R sessions running in containers.

Looks interesting.

Comments closed

Pivot Tables and Grouping Sets With data.table in R

Jozef Hajnala shows how you can use data.table to perform fast pivoting and arbitrary grouping of data in R:

Data manipulation and aggregation is one of the classic tasks anyone working with data will come across. We of course can perform data transformation and aggregation with base R, but when speed and memory efficiency come into play, data.table is my package of choice.

In this post we will look at of the fresh and very useful functionality that came to data.table only last year – grouping sets, enabling us, for example, to create pivot table-like reports with sub-totals and grand total quickly and easily.

Grouping sets are also available in SQL dialects and tend to be something people tend not to be aware of. This is a shame because they’re quite powerful, and Jozef shows how powerful they can be in R.

Comments closed

Flowcharts in R

Anisa Dhana builds a sample flowchart with DiagrammeR:

After some search, I found that there are a few packages in R which allow making exemplary flowcharts. The one which I found easy to use was DiagrammeR. The advantage of this packages is that generate diagrams using code within R Markdown syntax.

The taped-glasses nerd in me wants to point out that flow charts use geometric shapes to show flow and that this is more properly labeled a graph (the examples are directed acyclic graphs), but hush, taped-glasses nerd self.

Comments closed

Robust Regressions in R

Michael Grogan shows how you can find and re-weigh outliers when performing regressions:

A useful way of dealing with outliers is by running a robust regression, or a regression that adjusts the weights assigned to each observation in order to reduce the skew resulting from the outliers.

In this particular example, we will build a regression to analyse internet usage in megabytes across different observations. You will see that we have several outliers in this dataset. Specifically, we have three incidences where internet consumption is vastly higher than other observations in the dataset.

Let’s see how we can use a robust regression to mitigate for these outliers.

Click through for a demonstration.

Comments closed

Power BI IntelliSense For Python and R

David Eldersveld makes me wonder about the value of Power BI’s IntelliSense for R and Python:

If I type the letter into the R Script editor, my code completion options are actsalwaysand, and as. Power BI’s editor is not offering any IntelliSense options from a Python or R dictionary. Instead, it’s pulling from the text already in the editor. Note the comment in Line 1 and the inclusion of words beginning with the letter a — always, and, acts, as.

By comparison, the DAX editor contains a detailed function list and helpful annotations for code completion. Can we get something similar for R and Python? Not exactly… But there’s a workaround that I’m almost embarrassed to suggest. If you are a user who codes directly into the script editor, the following hack could be helpful. If you use the option to Edit script in External IDE, keep doing that and ignore the following guidance.

As-is, this is worse than no IntelliSense because at least with no IntelliSense, it’ll never steal a mouse click or keystroke. I wouldn’t expect RStudio level quality out of the gate but unless I’m missing something, that’s pretty bad.

1 Comment

Economic Articles With Data Included

Sebastian Kranz has a Shiny app to help you find economic papers with included data:

One gets some information about the size of the data files and the used code files. I also tried to find and extract a README file from each supplement. Most README files explain whether all results can be replicated with the provided data sets or whether some results require confidential or proprietary data sets. The link allows you to look at the README without the need to download the whole data set.

The main idea is that such a search function could be helpful for teaching economics and data science. For example, my students can use the app to find an interesting topic for a Bachelor or Master Thesis in form of an interactive analysis with RTutor. You could also generate a topic list for a seminar, in which students shall replicate some key findings of a resarch article.

I like this idea, particularly because it promotes the notion that if you’re going to write a paper based on a data set, you ought to provide the data set. There are too many cases of typos or accidental miscodings which take an interesting result and render it mundane (or sometimes even the exact opposite of what the paper reads). H/T R-Bloggers

Comments closed

Giving A Name To The R Pipe

John Mount noodles an idea from Hadley Wickham:

I’d say this fails on at least two counts, the first “%then%” doesn’t seem grammatical (as d is a noun), and magrittr pipes can’t be associated with a new name (as they are implemented by looking for theirselves by name in captured unevaluated code).

However, the wrapr dot arrow pipe can take on new names.

Let’s try a variation, using a traditional pronunciation: “to”.

I don’t like “then” very much. I definitely prefer the C# lambda pronunciation of “goes to” for this.

Click through for John’s thoughts on right assignment as well, something I almost categorically dislike.

Comments closed