Enter the Microsoft R Client. It includes Microsoft R Open, and adds in some of the ScaleR functions, which makes processing data faster and more efficient. And again, it’s a full R environment – you can write and run code, right there on your desktop. But the important bit is that it can connect to a Microsoft R Server (MRS) by seting something called the “Compute Context“, which tells the R environment to run on a more powerful, scalable server environment, like you may be used to with SQL Server.
The naming is a bit of a head-scratcher, to be honest.
It’s not uncommon for tests to be written at the get-go and then forgotten about. Remember that as code changes or incorrect behavior is found, new tests need to be written or existing tests need to be modified. Possibly worse than having no tests is having a bunch of tests spitting out false positives. This is because humans are prone to habituation and desensitization. It’s easy to become habituated to false positives to the point where we no longer pay attention to them.
Temporarily disabling tests may be acceptable in the short term. A more strategic solution is to optimize your test writing. The easier it is to create and modify tests, the more likely they will be correct and continue to provide value. For my testing, I generally write code to automate a lot of wiring to verify results programmatically.
I started this article with almost no idea how to test R code. I still don’t…but this article does help. I recommend reading it if you want to write production-quality R code.
In this post we’ll try to replicate some of the charts created by the Federal Reserve which visualize some well known macroeconomic indicators. We’ll also showcase the new Plotly 4.0 syntax.
This is a very code-heavy blog post and is a good way to learn about plotly.
In this post, we focus on sourcing R and Python’s external dependencies, such as R libraries and Python modules, which are not already installed on Azure ML and require code compilation. Commonly the compiled code comes from a variety of other languages such as C, C++ and Fortran. One could also use this approach to wrap their compiled code with R or Python wrappers and run it on Azure ML.
To illustrate the process, we will build two MurmurHash modules from C++ for R and Python using the following two implementations on GitHub, and link them to Azure ML from a zipped folder
Link via David Smith. I knew it was possible to call compiled C code from Python and R, but didn’t expect to be able to do it within Azure ML, so that’s good to know.
Now inside that file, you can add a number of functions that are based on a number of events like loading or closing R. I need a
.Firstfunction for on load and whatever I produce has to be able to print to the console with
I’ve seen people do things like this in .bash_profile, but didn’t know about .Rprofile before.
For this workload the reporting speeds don’t line up well with the price differences between the RDS instances. I suspect this workload is biased towards R’s CPU consumption when generating PNGs rather than RDS’ performance when returning aggregate results. The RDS instances share the same number of IOPS each which might erase any other performance advantage they could have over one another.
As for the money spent importing the data into RDS I suspect scaling up is more helpful when you have a number of concurrent users rather than a single, large job to execute.
This is an interesting series Mark has going.
Ned Bicare provides us a sure-fire method for getting our academic papers published:
“If you torture the data long enough, it will confess.”
This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if it was wrong to do creative data analysis.In fact, the art of creative data analysis has experienced despicable attacks over the last years. A small but annoyingly persistent group of second-stringers tries to denigrate our scientific achievements. They drag psychological science through the mire.
This minor update, codenamed “Bug in Your Hair”, makes a few small fixes to the R 3.3.0 release. Bugs fixed include mostly rarely-encountered cases like generating Gamma random numbers with zero or infinite rate parameters, and correctly matching text (with the
matchfunction) that only differed in the encoding.
There are no new features in this update, and all R code and packages should work with R 3.3.1 just as they did with R 3.3.0. For a complete list of the fixes in R 3.3.1, follow the link below.
Even though this is a small update, it might be useful to check out.
Say you’ve got 30 numbers and a strong urge to estimate their standard deviation. But you’ve left your computer at home. Unless you’re really good at mentally squaring and summing, it’s pretty hard to compute a standard deviation in your head. But there’s a heuristic you can use:
Subtract the smallest number from the largest number and divide by four
Let’s call it the “range over four” heuristic. You could, and probably should, be skeptical. You could want to see how accurate the heuristic is. And you could want to see how the heuristic’s accuracy depends on the distribution of numbers you are dealing with.
Sometimes you just don’t have STDEV() available.
We see a different behaviour:
messyinto a long data format with a warning by treating all columns as variable, while
melt()has treated trt as an “id variables”. Id columns are the columns that contain the identifier of the observation that is represented as a row in our data set. Indeed, if
melt()does not receive any id.variables specification, then it will use the factor or character columns as id variables.
gather()requires the columns that needs to be treated as ids, all the other columns are going to be used as key-value pairs.
Despite those last different results, we have seen that the two functions can be used to perform the exactly same operations on data frames, and only on data frames! Indeed,
gather()cannot handle matrices or arrays, while
melt()can as shown below.
It seems that these two tools have some overlap, but each has its own point of focus: tidyr is simpler for data tidying, whereas reshape2 has functionality (like data aggregation) which tidyr does not include.