Press "Enter" to skip to content

Category: R

Fraud Detection With R And Azure

David Smith shows us an online fraud detection template:

Detecting fraudulent transactions is a key applucation of statistical modeling, especially in an age of online transactions. R of course has many functions and packages suited to this purpose, including binary classification techniques such as logistic regression.

If you’d like to implement a fraud-detection application, the Cortana Analytics gallery features an Online Fraud Detection Template. This is a step-by step guide to building a web-service which will score transactions by likelihood of fraud, created in five steps

Read through for the five follow-up articles.  This is a fantastic series and I plan to walk through it step by step myself.

Comments closed

Learning R

Grant Fritchey is learning R:

Awesome. Fixed that algorithm problem, right?

Wrong.

That’s because algorithms are not the problem… the only problem. The real problem is data preparation. A lot of the examples you’ll read online are very straight forward with nice neat data sets. That’s because they were carefully groomed and prepared. Here I am looking at the wooly wild real data and I’m utterly lost in how to properly prepare this so that it’s appropriately set up as a continuous distribution(or a distribution at all). WOOF! The reason this is so hard is because I actually don’t understand the data fundamentals of the problem I’m trying to solve in exactly the way needed to solve the problem. More cogitation is necessary.

Just because you can write R code doesn’t mean you are a data scientist.  Grant has the right mindset, but this post is fair warning that R’s complexity isn’t so much in its being a DSL, but rather in the domain itself.

Comments closed

Data Analysis With R: Tutorials

Revolution Analytics has a series of tutorials using SQL Server R Services:

You may have heard that R and the big-data RevoScaleR package have been integrated with with SQL Server 2016 as SQL Server R Services. If you’ve been wanting to try out R with SQL Server but haven’t been sure where to start, a new MSDN tutorial will take you through all the steps of creating a predictive model: from obtaining data for analysis, to building a statistical model, to creating a stored prodedure to make predictions from the model. To work through the tutorial, you’ll need a suitable Windows server on which to install the SQL Server 2016 Community Technology Preview, and make sure you have SQL Server R Services installed. You’ll also need a separate Windows machine (say a desktop or laptop) where you’ll install Revolution R Open and Revolution R Enterprise. Most of the computations will be happening in SQL Server, though, so this “data science client machine” doesn’t need to be as powerful.

The tutorial is made up of five lessons, which together should take you about 90 minutes to run though. If you run into problems, each lesson includes troubleshooting tips at the end.

SQL Server R Services has the potential to be a great tool.  The standard V1 warning obviously applies, but I’m excited.

Comments closed

Installing And Using SQL Server R Services

I have three blog posts on installing and using R in SQL Server.

First, installing SQL Server R Services:

I’m excited that CTP 3 of SQL Server 2016 is publicly available, in no small part because it is our first look at SQL Server R Services.  In this post, I’m going to walk through installing Don’t-Call-It-SSRS on a machine.

Then, using RODBC to connect a Linux machine with RStudio installed to a SQL Server instance:

Getting a Linux machine to talk to a SQL Server instance is harder than it should be.  Yes, Microsoft has a Linux ODBC driver and some easy setup instructions…if you’re using Red Hat or SuSE.  Hopefully this helps you get connected.

If you’re using RStudio on Windows, it’s a lot easier:  create a DSN using your ODBC Data Sources.

Finally, using SQL Server R Services:

So, what’s the major use of SQL Server R Services?  Early on, I see batch processing as the main driver here.  The whole point of getting involved with Revolution R is to create sever-quality R, so imagine a SQL Agent job which runs this procedure once a night against some raw data set.  The R job could build a model, process that data, and return a result set.  You take that result set and feed it into a table for reporting purposes.  I’d like to see more uses, but this is probably the first one we’ll see in the wild.

It’s a preview of a V1 product.  Keep that in mind.

The first and third posts are for CTP 3, so beware the time-sensitive material warnings.

Comments closed

Learning R

Jen Stirrup has started a new series on getting started with R.  First, installing R:

First up, what do you need to know about SQL Server installation with R? The installation sequence is well documented here. However, if you want to make sure that the R piece is installed, then you will need to make sure that you do one thing: tick the Advanced Analytics Extension box.

Her next post covers language basics in contrast to SQL Server:

There are similarities and differences between SQL and R, which might be confusing. However, I think it can be illuminating to understand these similarities and differences since it tells you something about each language. I got this idea from one of the attendees at PASS Summit 2015 and my kudos and thanks go to her. I’m sorry I didn’t get  her name, but if you see this you will know who you are, so please feel free to leave a comment so that I can give you a proper shout out.

I’m looking forward to the rest of this series.

Comments closed

Buck Woody On R & Python

Buck Woody’s back to blogging, and his focus is data science.  Over the past month, he’s looked at R and Python.

First, on installing R:

In future notebook entries we’ll explore working with R, but for now, we need to install it. That really isn’t that difficult, but it does bring up something we need to deal with first. While the R environment is truly amazing, it has some limitations. It’s most glaring issue is that the data you want to work with is loaded into memory as a frame, which of course limits the amount of data you can process for a given task. It’s also not terribly suited for parallelism – many things are handled as in-line tasks. And if you use a package in your script, you have to ensure others load that script, and at the right version.

Enter Revolution Analytics – a company that changed R to include more features and capabilities to correct these issues, along with a few others. They have a great name in the industry, bright people, and great products – so Microsoft bought them. That means the “RRE” engine they created is going to start popping up in all sorts of places, like SQL Server 2016, Azure Machine Learning, and many others. But the “stand-alone” RRE products are still available, and at the current version. So that’s what we’ll install.

Also on installing and getting started with Python:

Python has some distinct differences that make it attractive for working in data analytics. It scales well, is fairly easy to learn and use, has an extensible framework, has support for almost every platform around, and you can use it to write extensive programs that work with almost any other system and platform.

R and Python are the two biggest languages in this slice of the field, and you’ll gain a lot from learning at least one of these languages.

Comments closed