One gets some information about the size of the data files and the used code files. I also tried to find and extract a README file from each supplement. Most README files explain whether all results can be replicated with the provided data sets or whether some results require confidential or proprietary data sets. The link allows you to look at the README without the need to download the whole data set.
The main idea is that such a search function could be helpful for teaching economics and data science. For example, my students can use the app to find an interesting topic for a Bachelor or Master Thesis in form of an interactive analysis with RTutor. You could also generate a topic list for a seminar, in which students shall replicate some key findings of a resarch article.
I like this idea, particularly because it promotes the notion that if you’re going to write a paper based on a data set, you ought to provide the data set. There are too many cases of typos or accidental miscodings which take an interesting result and render it mundane (or sometimes even the exact opposite of what the paper reads). H/T R-Bloggers
I’d say this fails on at least two counts, the first “
%then%” doesn’t seem grammatical (as
dis a noun), and
magrittrpipes can’t be associated with a new name (as they are implemented by looking for theirselves by name in captured unevaluated code).
wraprdot arrow pipe can take on new names.
Let’s try a variation, using a traditional pronunciation: “to”.
I don’t like “then” very much. I definitely prefer the C# lambda pronunciation of “goes to” for this.
Click through for John’s thoughts on right assignment as well, something I almost categorically dislike.
We will need to typically transform the problem of utility modeling from its intangible, abstract form to something that is measurable. That is, we wish to assign a numeric value to the perceived utility by the consumer, and we want to measure that accurately and precisely (as much as possible).
This is where survey design comes in, where, as a market researcher, we must design inputs (in the form of questionnaires) to have respondents do the hard work of transforming their qualitative, habitual, perceptual opinions into simplified, summarized aggregate values which are expressed either as a numeric value or on a rank scale.
I tend to shy away from this kind of analysis because it runs a huge risk of trying to turn ordinal utility rankings into cardinal functions.
Since some time, there’s a wrapper for
ggplot2available, bundled in the package
ggformula. One nice thing is that in that it plays nicely with the popular R package
mosaicprovides some useful functions for modeling along with a tamed and consistent syntax. In this post, we will discuss some “ornaments”, that is, some details of beautification of a plot. I confess that every one will deem it central, but in some cases in comes in handy to know how to “refine” a plot using
Note that this “refinement” is primarily controlled via the function
gf_lab()(for labs), and
gf_lims()(for axis limits). Themes can be adjusted using
Click through for several examples.
For the brevity of this post, I will only download couple of R packages from CRAN repository, but this list is indefinite.
There are ways many ways to retrieve the CRAN packages for particular R version using powershell. I will just demonstrate this by using Invoke-WebRequest cmdlet.
Pointing your cmdlet to URL: https://cran.r-project.org/bin/windows/contrib/3.5 where list of all packages for this version is available. But first we need to extract the HTML tag where information is stored.
There’s quite a bit of code here, but the upside is that you get the ability to automate server installs.
The original and most simple scenario of the Monty Hall problem is this: You are in a prize contest and in front of you there are three doors (A, B and C). Behind one of the doors is a prize (Car), while behind others is a loss (Goat). You first choose a door (let’s say door A). The contest host then opens another door behind which is a goat (let’s say door B), and then he ask you will you stay behind your original choice or will you switch the door. The question behind this is what is the better strategy?
This is something that puzzled me for a very long time. This is fundamentally a Bayesian problem built around processing new information, and once I understood that, the answer was a lot clearer. H/T R-Bloggers.
Rpackage and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.
The user can now control which columns of a
cdatacontrol table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.
Read on for an example of how you can use this.
If you’re looking a guide to making publication-ready data visualizations in R, check out the BBC Visual and Data Journalism cookbook for R graphics. Announced in a BBC blog post this week, it provides scripts for making line charts, bar charts, and other visualizations like those below used in the BBC’s data journalism.
I’m still reading through the linked cookbook but it’s a good one.
For better or worse I spend some time each day at Stack Overflow [r], reading and answering questions. If you do the same, you probably notice certain features in questions that recur frequently. It’s as though everyone is copying from one source – perhaps the one at the top of the search results. And it seems highest-ranked is not always best.
Nowhere is this more apparent to me than in the way many users create data frames. So here is my introductory guide “how not to create data frames”, aimed at beginners writing their first questions.
Read on for a few tips. These are aimed at people asking questions but they’re sound advice in general.
The idea is: many important calculations can be considered as a sequence of transforms applied to a data set. Each step may be a function taking many arguments. It is often the case that only one of each function’s arguments is primary, and the rest are parameters. For data science applications this is particularly common, so having convenient pipeline notation can be a plus. An example of a non-trivial data processing pipeline can be found here.
As you’d expect from John, there’s a lot of detail and it’s an interesting approach.