Press "Enter" to skip to content

Category: R

Flowcharts in R

Anisa Dhana builds a sample flowchart with DiagrammeR:

After some search, I found that there are a few packages in R which allow making exemplary flowcharts. The one which I found easy to use was DiagrammeR. The advantage of this packages is that generate diagrams using code within R Markdown syntax.

The taped-glasses nerd in me wants to point out that flow charts use geometric shapes to show flow and that this is more properly labeled a graph (the examples are directed acyclic graphs), but hush, taped-glasses nerd self.

Comments closed

Robust Regressions in R

Michael Grogan shows how you can find and re-weigh outliers when performing regressions:

A useful way of dealing with outliers is by running a robust regression, or a regression that adjusts the weights assigned to each observation in order to reduce the skew resulting from the outliers.

In this particular example, we will build a regression to analyse internet usage in megabytes across different observations. You will see that we have several outliers in this dataset. Specifically, we have three incidences where internet consumption is vastly higher than other observations in the dataset.

Let’s see how we can use a robust regression to mitigate for these outliers.

Click through for a demonstration.

Comments closed

Power BI IntelliSense For Python and R

David Eldersveld makes me wonder about the value of Power BI’s IntelliSense for R and Python:

If I type the letter into the R Script editor, my code completion options are actsalwaysand, and as. Power BI’s editor is not offering any IntelliSense options from a Python or R dictionary. Instead, it’s pulling from the text already in the editor. Note the comment in Line 1 and the inclusion of words beginning with the letter a — always, and, acts, as.

By comparison, the DAX editor contains a detailed function list and helpful annotations for code completion. Can we get something similar for R and Python? Not exactly… But there’s a workaround that I’m almost embarrassed to suggest. If you are a user who codes directly into the script editor, the following hack could be helpful. If you use the option to Edit script in External IDE, keep doing that and ignore the following guidance.

As-is, this is worse than no IntelliSense because at least with no IntelliSense, it’ll never steal a mouse click or keystroke. I wouldn’t expect RStudio level quality out of the gate but unless I’m missing something, that’s pretty bad.

1 Comment

Economic Articles With Data Included

Sebastian Kranz has a Shiny app to help you find economic papers with included data:

One gets some information about the size of the data files and the used code files. I also tried to find and extract a README file from each supplement. Most README files explain whether all results can be replicated with the provided data sets or whether some results require confidential or proprietary data sets. The link allows you to look at the README without the need to download the whole data set.

The main idea is that such a search function could be helpful for teaching economics and data science. For example, my students can use the app to find an interesting topic for a Bachelor or Master Thesis in form of an interactive analysis with RTutor. You could also generate a topic list for a seminar, in which students shall replicate some key findings of a resarch article.

I like this idea, particularly because it promotes the notion that if you’re going to write a paper based on a data set, you ought to provide the data set. There are too many cases of typos or accidental miscodings which take an interesting result and render it mundane (or sometimes even the exact opposite of what the paper reads). H/T R-Bloggers

Comments closed

Giving A Name To The R Pipe

John Mount noodles an idea from Hadley Wickham:

I’d say this fails on at least two counts, the first “%then%” doesn’t seem grammatical (as d is a noun), and magrittr pipes can’t be associated with a new name (as they are implemented by looking for theirselves by name in captured unevaluated code).

However, the wrapr dot arrow pipe can take on new names.

Let’s try a variation, using a traditional pronunciation: “to”.

I don’t like “then” very much. I definitely prefer the C# lambda pronunciation of “goes to” for this.

Click through for John’s thoughts on right assignment as well, something I almost categorically dislike.

Comments closed

Conjoint Analysis In R

Abhijit Telang introduces the concept of conjoint analysis and shows how you can implement this in R:

We will need to typically transform the problem of utility modeling from its intangible, abstract form to something that is measurable. That is, we wish to assign a numeric value to the perceived utility by the consumer, and we want to measure that accurately and precisely (as much as possible).

This is where survey design comes in, where, as a market researcher, we must design inputs (in the form of questionnaires) to have respondents do the hard work of transforming their qualitative, habitual, perceptual opinions into  simplified, summarized aggregate values which are expressed either as a numeric value or on a rank scale.

I tend to shy away from this kind of analysis because it runs a huge risk of trying to turn ordinal utility rankings into cardinal functions.

Comments closed

Improving Plots With ggformula

Sebastian Sauer shows how you can use the ggformula package combined with ggplot2 to enhance your R visuals:

Since some time, there’s a wrapper for ggplot2 available, bundled in the package ggformula. One nice thing is that in that it plays nicely with the popular R package mosaicmosaic provides some useful functions for modeling along with a tamed and consistent syntax. In this post, we will discuss some “ornaments”, that is, some details of beautification of a plot. I confess that every one will deem it central, but in some cases in comes in handy to know how to “refine” a plot using ggformula.

Note that this “refinement” is primarily controlled via the function gf_refine() (most stuff), gf_lab() (for labs), and gf_lims() (for axis limits). Themes can be adjusted using gf_theme().

Click through for several examples.

Comments closed

Installing R From Powershell

Tomaz Kastrun shows us how to install R and RStudio via Powershell:

For the brevity of this post, I will only download couple of R packages from CRAN repository, but this list is indefinite.
There are ways many ways to retrieve the CRAN packages for particular R version using powershell. I will just demonstrate this by using Invoke-WebRequest cmdlet.
Pointing your cmdlet to URL: https://cran.r-project.org/bin/windows/contrib/3.5  where  list of all packages for this version is available. But first we need to extract the HTML tag where information is stored.

There’s quite a bit of code here, but the upside is that you get the ability to automate server installs.

Comments closed

Solving The Monty Hall Problem With R

Miroslav Rajter builds a Monty Hall problem simulator using R:

The original and most simple scenario of the Monty Hall problem is this: You are in a prize contest and in front of you there are three doors (A, B and C). Behind one of the doors is a prize (Car), while behind others is a loss (Goat). You first choose a door (let’s say door A). The contest host then opens another door behind which is a goat (let’s say door B), and then he ask you will you stay behind your original choice or will you switch the door. The question behind this is what is the better strategy?

This is something that puzzled me for a very long time. This is fundamentally a Bayesian problem built around processing new information, and once I understood that, the answer was a lot clearer. H/T R-Bloggers.

Comments closed