Press "Enter" to skip to content

Category: R

satRdays

Steph Locke notes that the R Consortium has agreed to support satRdays:

I’m very pleased to say that the R Consortium agreed to the support the satRday project!

The idea kicked off in November and I was over the moon with the response from the community, then we garnered support before submitting to the Consortium and I must have looped the moon a few times as we had more than 500 responses. Now the R Consortium are supporting us and we can turn all that enthusiasm into action.

This is great.  I’m looking forward to this taking off and being a nice complement to SQL Saturdays in cities.

Comments closed

HIBPwned

Steph Locke has created an R package to query Troy Hunt’s Have I Been Pwned? site:

The answer in life to the inevitable question of “How can I do that in R?” should be “There’s a package for that”. So when I wanted to query HaveIBeenPwned.com (HIBP) to check whether a bunch of emails had been involved in data breaches and there wasn’t an R package for HIBP, it meant that the responsibility for making one landed on my shoulders. Now, you can see if your accounts are at risk with the R package for HaveIBeenPwned.com, HIBPwned.

This is a nice confluence of two fun topics, so of course I like it.

Comments closed

R And SSH Tunnels

Steph Locke shows how to set up an SSH tunnel to connect to an external server within R:

Whilst down the rabbit hole, I discovered just in passing via a beanstalk article that there’s actually been a command line interface for PuTTY called plink. D’oh! This changed the whole direction of the solution to what I present throughout.

Using plink.exe as the command line interface for PuTTY we can then connect to our remote network using the key pre-authenticated via pageant. As a consequence, we can now use the shell() command in R to use plink. We can then connect to our database using the standard Postgres driver.

PuTTY is a must-have for any Windows box.

Comments closed

Mockaroo

Steph Locke tells us about a way to mock data for R:

Mockaroo is a really impressive service with a wide spread of different data types. They also have simple ways of adding things like within group differences to data so that you can mock realistic class differences. They use the freemium model so you can get a thousand rows per download, which is pretty sweet. The big BUT you can feel coming on is this – it’s a GUI! I don’t want to have spend time hand cranking a data extract.

Thankfully, they have a GUI for getting data too and it’s pretty simply to use so I’ve started making a package for it.

Steph is working on an R package, so this is pretty exciting.

Comments closed

R Tools For Visual Studio Launched

R now integrates into Visual Studio:

RTVS is an IDE and as such you can use it with any recent version of R such as 3.2.x. If you install the free Microsoft R Open, you automatically get some turbo options such as threading support on multi-processor machines, providing significant speedup for a variety of analytical functions, as well as package collections check-pointed to a particular date/version. Microsoft R Server provides Big Data support and additional advanced features that can be used with SQL Server.

This is an early release, so expect a few bugs and some missing functionality.  It also isn’t RStudio—it’s RStudio several years ago.  But what it does nicely is integrate with the rest of your stack:  you can tie together the R code, the C#/F# code which helps clean data, the SQL Server project which holds your data, etc. etc.

Comments closed

Credit Card Fraud Detection Using R

David Smith gives us a tutorial on credit card fraud detection:

If you have a database of credit-card transactions with a small percentage tagged as fraudulent, how can you create a process that automatically flags likely fraudulent transactions in the future? That’s the premise behind the latest Data Science Deep Dive on MSDN. This tutorial provides a step by step to using the R language and the big-data statistical models of the RevoScaleR package of SQL Server 2016 R Services to build and use a predictive model to detect fraud.

This looks to be a follow-up from the fraud detection series.

Comments closed

Data Manipulation In R

Casimir Saternos has an article on matrix operations and other data transformations in R:

Operations that are conceptually simple can be difficult to perform using SQL.  Consider the common requirements to pivot or transpose a dataset.   Each of these actions are conceptually straightforward but are complex to implement using SQL.  The examples that follow are somewhat verbose, but the details are not significant. The main point is to illustrate is that, by using specialized functions outside of SQL,  R makes trivial some of those operations that would otherwise require complex SQL statements.  The contrast in the amount of code required is striking.  The simpler approach allows you to focus attention on the scientific or business problem at hand, rather than expending energy reading documentation or laboriously testing complex statements.

I consider this where the second-order value of R comes in.  The initial “wow” factor is in how easy you can plot things, and this ease of data cleansing is the next big time-saver.

Comments closed

Fixing SQL Server R Services Installation Issues

Cody Konior notes that upgrading from CTP 3.0 to CTP 3.2 can cause SQL Server R Services to break:

If you were using CTP 3.0 and later ran an in-place upgrade to CTP 3.2 this will silently break R Services. Uninstalling and reinstalling the R component will not fix the problem, but it can be fixed. There are a few interrelated issues here so bear with me.

Hopefully you don’t run into this issue, but if you do, at least there’s a fix.

Comments closed

Analyzing World Running Times

Andrie de Vries looks at average speed for different mens’ running events:

However, it seems that there might be two kinks in the line:

  • The first kink occurs somewhere between the 800m distance and the mile. It seems that the sprinting distances (and the 800m is sometimes called a long sprint) has different dynamics from the events up to the marathon.

  • And then there is another kink for the ultra-marathon distances. The standard marathon is 42.2km, and distances longer than this are called ultramarathons.

The analysis is done in R, and the code is available in the post.  Check it out.

Comments closed