R – Page 146 – Curated SQL

Introduction To R

Published 2016-07-12 by Kevin Feasel

Paul Hernandez has an introduction to using the R client and RODBC to connect to SQL Server:

The first step is to load the RevoScaleR library. This is an amazing library that allows to create scalable and performant applications with R.

Then a connection string is defined, in my case using Windows Authentication. If you want to use SQL Server authentication the user name and password are needed.

We define a local folder as the compute context.

RxInSQLServer: generates a SQL Server compute context using SQL Server R Services –documentation

Sample query: I already prepared the dataset in the view, this is a best practice in order to reduce the size of the query in the R code and for me is also easier to maintain.

I think there’s a lot of value in learning R, regardless of whether you have “data analyst” in your role or job title.

Comments closed

Microsoft R Client

Published 2016-07-11 by Kevin Feasel

Buck Woody discusses whether Microsoft R Client really is a client:

Enter the Microsoft R Client. It includes Microsoft R Open, and adds in some of the ScaleR functions, which makes processing data faster and more efficient. And again, it’s a full R environment – you can write and run code, right there on your desktop. But the important bit is that it can connect to a Microsoft R Server (MRS) by seting something called the “Compute Context“, which tells the R environment to run on a more powerful, scalable server environment, like you may be used to with SQL Server.

The naming is a bit of a head-scratcher, to be honest.

Comments closed

Writing Good Tests In R

Published 2016-07-08 by Kevin Feasel

Brian Rowe discusses testing strategy in R:

It’s not uncommon for tests to be written at the get-go and then forgotten about. Remember that as code changes or incorrect behavior is found, new tests need to be written or existing tests need to be modified. Possibly worse than having no tests is having a bunch of tests spitting out false positives. This is because humans are prone to habituation and desensitization. It’s easy to become habituated to false positives to the point where we no longer pay attention to them.

Temporarily disabling tests may be acceptable in the short term. A more strategic solution is to optimize your test writing. The easier it is to create and modify tests, the more likely they will be correct and continue to provide value. For my testing, I generally write code to automate a lot of wiring to verify results programmatically.

I started this article with almost no idea how to test R code. I still don’t…but this article does help. I recommend reading it if you want to write production-quality R code.

Comments closed

Macroeconomic Charts

Published 2016-07-05 by Kevin Feasel

Riddhiman shows how to use R and plotly to build charts of Federal Reserve data sets:

In this post we’ll try to replicate some of the charts created by the Federal Reserve which visualize some well known macroeconomic indicators. We’ll also showcase the new Plotly 4.0 syntax.

This is a very code-heavy blog post and is a good way to learn about plotly.

Comments closed

Running Compiled Code In Azure ML

Published 2016-07-04 by Kevin Feasel

Max Kaznady shows how to use R or Python scripts to call compiled code within Azure ML:

In this post, we focus on sourcing R and Python’s external dependencies, such as R libraries and Python modules, which are not already installed on Azure ML and require code compilation. Commonly the compiled code comes from a variety of other languages such as C, C++ and Fortran. One could also use this approach to wrap their compiled code with R or Python wrappers and run it on Azure ML.

To illustrate the process, we will build two MurmurHash modules from C++ for R and Python using the following two implementations on GitHub, and link them to Azure ML from a zipped folder

Link via David Smith. I knew it was possible to call compiled C code from Python and R, but didn’t expect to be able to do it within Azure ML, so that’s good to know.

Comments closed

Rprofile For Notifications

Published 2016-06-28 by Kevin Feasel

Steph Locke shows how to use .Rprofile to make your life easier:

First of all, you need a file called .Rprofile that’s stored in your working directory. Some useful resources about .Rprofiles can be found on .Rprofile CRAN docs and an .Rprofile intro.

Now inside that file, you can add a number of functions that are based on a number of events like loading or closing R. I need a .First function for on load and whatever I produce has to be able to print to the console with cat().

With that in mind, instead of showing details, I chose to show the number of breaches I’m in. You can get HIBPwned from CRAN and use it to query the awesome website HaveIBeenPwned.com.

I’ve seen people do things like this in .bash_profile, but didn’t know about .Rprofile before.

Comments closed

Taxi Rides

Published 2016-06-24 by Kevin Feasel

Mark Litwintschik has an ongoing taxi ride data analysis series. This time, he gives PostgreSQL a run:

For this workload the reporting speeds don’t line up well with the price differences between the RDS instances. I suspect this workload is biased towards R’s CPU consumption when generating PNGs rather than RDS’ performance when returning aggregate results. The RDS instances share the same number of IOPS each which might erase any other performance advantage they could have over one another.

As for the money spent importing the data into RDS I suspect scaling up is more helpful when you have a number of concurrent users rather than a single, large job to execute.

This is an interesting series Mark has going.

Comments closed

Hack Those P Values!

Published 2016-06-24 by Kevin Feasel

Ned Bicare provides us a sure-fire method for getting our academic papers published:

“If you torture the data long enough, it will confess.”

This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if it was wrong to do creative data analysis.

In fact, the art of creative data analysis has experienced despicable attacks over the last years. A small but annoyingly persistent group of second-stringers tries to denigrate our scientific achievements. They drag psychological science through the mire.

Ned has a great tool to play around with as well, letting us Statistics our way to academic success.

Comments closed

R 3.3.1 Available

Published 2016-06-23 by Kevin Feasel

David Smith reports that a new version of R is now available, 3.3.1:

This minor update, codenamed “Bug in Your Hair”, makes a few small fixes to the R 3.3.0 release. Bugs fixed include mostly rarely-encountered cases like generating Gamma random numbers with zero or infinite rate parameters, and correctly matching text (with the matchfunction) that only differed in the encoding.

There are no new features in this update, and all R code and packages should work with R 3.3.1 just as they did with R 3.3.0. For a complete list of the fixes in R 3.3.1, follow the link below.

Even though this is a small update, it might be useful to check out.

Comments closed

Standard Deviation Estimation

Published 2016-06-23 by Kevin Feasel

Dan Goldstein gives a rule of thumb for getting standard deviations for various distributions:

Say you’ve got 30 numbers and a strong urge to estimate their standard deviation. But you’ve left your computer at home. Unless you’re really good at mentally squaring and summing, it’s pretty hard to compute a standard deviation in your head. But there’s a heuristic you can use:

Subtract the smallest number from the largest number and divide by four

Let’s call it the “range over four” heuristic. You could, and probably should, be skeptical. You could want to see how accurate the heuristic is. And you could want to see how the heuristic’s accuracy depends on the distribution of numbers you are dealing with.

Sometimes you just don’t have STDEV() available.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: R