Jan Mulkens has started a series on combining Power BI and R.
Fact is, R is here to stay. Even Microsoft has integrated R with SQL Server 2016 and it has made R scripting possible in it’s great Azure Machine Learning service.
So it was only a matter of time before we were going to see R integrated in Power BI.
From the previous point, it seems that R is just running in the background and that most of the functionality can be used.
Testing some basic functionality like importing and transforming data in the R visual worked fine.
I haven’t tried any predictive modelling yet but I assume that will just work as well.
So instead of printing “Hello world” to the screen, we’ll use a simple graph to say hello to the world.
First we need some data, Power BI enables us to enter some data in a familiar Excel style.
Just select “Enter Data” and start bashing out some data.
I’m looking forward to the rest of the series.
So I went through and converted everything in my Rtraining to this and realised it messed up my slide decks – it’s been so long since I had built a pure knitr solution that I forgot that
knitr::knit. For my slidedecks, if I wanted the ioslides_presentation format, I needed to use
rmarkdown::render. The problem with that has been the relative references to the CSS and the logo.
To solve this I read about the custom render formats capability and created afunction that produces an ioslides_presentation but with my CSS preloaded by default. This now means that I can produce slides with better file referencing.
Steph has put up all of her R-related presentations and documentation as well, so check that out.
Detecting fraudulent transactions is a key applucation of statistical modeling, especially in an age of online transactions. R of course has many functions and packages suited to this purpose, including binary classification techniques such as logistic regression.
If you’d like to implement a fraud-detection application, the Cortana Analytics gallery features an Online Fraud Detection Template. This is a step-by step guide to building a web-service which will score transactions by likelihood of fraud, created in five steps
Read through for the five follow-up articles. This is a fantastic series and I plan to walk through it step by step myself.
Awesome. Fixed that algorithm problem, right?
That’s because algorithms are not the problem… the only problem. The real problem is data preparation. A lot of the examples you’ll read online are very straight forward with nice neat data sets. That’s because they were carefully groomed and prepared. Here I am looking at the wooly wild real data and I’m utterly lost in how to properly prepare this so that it’s appropriately set up as a continuous distribution(or a distribution at all). WOOF! The reason this is so hard is because I actually don’t understand the data fundamentals of the problem I’m trying to solve in exactly the way needed to solve the problem. More cogitation is necessary.
Just because you can write R code doesn’t mean you are a data scientist. Grant has the right mindset, but this post is fair warning that R’s complexity isn’t so much in its being a DSL, but rather in the domain itself.
You may have heard that R and the big-data RevoScaleR package have been integrated with with SQL Server 2016 as SQL Server R Services. If you’ve been wanting to try out R with SQL Server but haven’t been sure where to start, a new MSDN tutorial will take you through all the steps of creating a predictive model: from obtaining data for analysis, to building a statistical model, to creating a stored prodedure to make predictions from the model. To work through the tutorial, you’ll need a suitable Windows server on which to install the SQL Server 2016 Community Technology Preview, and make sure you have SQL Server R Services installed. You’ll also need a separate Windows machine (say a desktop or laptop) where you’ll install Revolution R Open and Revolution R Enterprise. Most of the computations will be happening in SQL Server, though, so this “data science client machine” doesn’t need to be as powerful.
The tutorial is made up of five lessons, which together should take you about 90 minutes to run though. If you run into problems, each lesson includes troubleshooting tips at the end.
SQL Server R Services has the potential to be a great tool. The standard V1 warning obviously applies, but I’m excited.
When I needed to do an rmarkdown repository for making R Consortium Infrastructure Proposals, I was able to take the opportunity to take Jan’s code and move forward with it so that the ISC proposal is always web-facing. Here’s how I did it:
She’s using this to build the satRday planning site.
I have three blog posts on installing and using R in SQL Server.
First, installing SQL Server R Services:
I’m excited that CTP 3 of SQL Server 2016 is publicly available, in no small part because it is our first look at SQL Server R Services. In this post, I’m going to walk through installing Don’t-Call-It-SSRS on a machine.
Getting a Linux machine to talk to a SQL Server instance is harder than it should be. Yes, Microsoft has a Linux ODBC driver and some easy setup instructions…if you’re using Red Hat or SuSE. Hopefully this helps you get connected.
If you’re using RStudio on Windows, it’s a lot easier: create a DSN using your ODBC Data Sources.
Finally, using SQL Server R Services:
So, what’s the major use of SQL Server R Services? Early on, I see batch processing as the main driver here. The whole point of getting involved with Revolution R is to create sever-quality R, so imagine a SQL Agent job which runs this procedure once a night against some raw data set. The R job could build a model, process that data, and return a result set. You take that result set and feed it into a table for reporting purposes. I’d like to see more uses, but this is probably the first one we’ll see in the wild.
It’s a preview of a V1 product. Keep that in mind.
The first and third posts are for CTP 3, so beware the time-sensitive material warnings.
Jen Stirrup has started a new series on getting started with R. First, installing R:
First up, what do you need to know about SQL Server installation with R? The installation sequence is well documented here. However, if you want to make sure that the R piece is installed, then you will need to make sure that you do one thing: tick the Advanced Analytics Extension box.
Her next post covers language basics in contrast to SQL Server:
There are similarities and differences between SQL and R, which might be confusing. However, I think it can be illuminating to understand these similarities and differences since it tells you something about each language. I got this idea from one of the attendees at PASS Summit 2015 and my kudos and thanks go to her. I’m sorry I didn’t get her name, but if you see this you will know who you are, so please feel free to leave a comment so that I can give you a proper shout out.
I’m looking forward to the rest of this series.
Buck Woody’s back to blogging, and his focus is data science. Over the past month, he’s looked at R and Python.
In future notebook entries we’ll explore working with R, but for now, we need to install it. That really isn’t that difficult, but it does bring up something we need to deal with first. While the R environment is truly amazing, it has some limitations. It’s most glaring issue is that the data you want to work with is loaded into memory as a frame, which of course limits the amount of data you can process for a given task. It’s also not terribly suited for parallelism – many things are handled as in-line tasks. And if you use a package in your script, you have to ensure others load that script, and at the right version.
Enter Revolution Analytics – a company that changed R to include more features and capabilities to correct these issues, along with a few others. They have a great name in the industry, bright people, and great products – so Microsoft bought them. That means the “RRE” engine they created is going to start popping up in all sorts of places, like SQL Server 2016, Azure Machine Learning, and many others. But the “stand-alone” RRE products are still available, and at the current version. So that’s what we’ll install.
Python has some distinct differences that make it attractive for working in data analytics. It scales well, is fairly easy to learn and use, has an extensible framework, has support for almost every platform around, and you can use it to write extensive programs that work with almost any other system and platform.
R and Python are the two biggest languages in this slice of the field, and you’ll gain a lot from learning at least one of these languages.