Category: R

Data Analysis Basics In R

Published 2017-01-18 by Kevin Feasel

Sibanjan Das provides some of the basics of data analysis using R:

Let’s start thinking in a logical way the steps that one should perform once we have the data imported into R.

The first step would be to discover what’s in the data file that was exported. To do this, we can:

Use head function to view few rows from the data set. By default, head shows first 5 rows. Ex: head(mtcars)

str to view the structure of the imported data. Ex: str(mtcars)

summary to view the data summary. Ex: summary(mtcars)

There’s a lot to data analysis, but this is a good start.

Comments closed

Multiple Regression

Published 2017-01-17 by Kevin Feasel

Anastasios Markitsis is starting a series of exercises on multiple regression in R:

Exercise 1

a. Load the state datasets.
b. Convert the state.x77 dataset to a dataframe.
c. Rename the Life Exp variable to Life.Exp, and HS Grad to HS.Grad. (This avoids problems with referring to these variables when specifying a model.)

Click through for the rest of the exercises as well as the answers.

Comments closed

Animating Visuals In R

Published 2017-01-17 by Kevin Feasel

Tomaz Kastrun shows how to create animated charts in R using ggplot2:

In addition to R code, the ImageMagic program needs to be installed on your machine, as well. Also the speed, quality and many other parameters can be set, when creating animated gif.

Animated gif can be also included into your SSRS report, your Sharepoint site or any other site – like my blog 🙂 and it will stay interactive. In Power BI, importing animated gif as a picture, unfortunately will not work.

Be very careful with this, as not everything supports animated GIFs and you can make some really painful graphs if you try hard enough…

Comments closed

R Tools For Visual Studio

Published 2017-01-16 by Kevin Feasel

Matt Willis has a two-parter on R Tools for Visual Studio. First, an introduction:

Once all the prerequisites have been installed it is time to move onto the fun stuff! Open up Visual Studio 2015 and add an R Project: File > Add > New Project and select R. You will be presented with the screen below, name the project AutomobileRegression and select OK.

Microsoft have done a fantastic job realising that the settings and toolbar required in R is very different to those required when using Visual Studio, so they have split them out and made it very easy to switch between the two. To switch to the settings designed for using R go to R Tools > Data Science Settings you’ll be presented with two pop ups select Yes on both to proceed. This will now allow you to use all those nifty shortcuts you have learnt to use in RStudio. Anytime you want to go back to the original settings you can do so by going to Tools > Import/Export Settings.

Next is executing an Azure Machine Learning web service within RTVS:

Whilst in R you can implement very complex Machine Learning algorithms, for anyone new to Machine Learning I personally believe Azure Machine Learning is a more suitable tool for being introduced to the concepts.

Please refer to this blog where I have described how to create the Azure Machine Learning web service I will be using in the next section of this blog. You can either use your own web service or follow my other blog, which has been especially written to allow you to follow along with this blog.

Coming back to RTVS we want to execute the web service we have created.

RTVS has grown on me. It’s still not R Studio and may never be, but they’ve come a long way in a few months.

Comments closed

Subset And Apply Problems

Published 2017-01-11 by Kevin Feasel

Tom Martens explains a class of generic data processing problems:

Subset and Apply means that I have a dataset of some rows where due to some conditions all the rows have to be put into a bucket and then a function has to be applied to each bucket.

The simple problem can be solved by a GROUP BY using T-SQL, the not so simple problem requires that all columns and rows of the dataset have to be retained for further processing, even if these columns are not used to subset or bucket the rows in your dataset.

One quick example of this is running totals of orders for each customer, which Tom answers using T-SQL, R, and Power BI. Click through for those three solutions.

Comments closed

Parsing JSON In R

Published 2017-01-10 by Kevin Feasel

Tomaz Kastrun shows how to feed a JSON data set into R and turn that into a proper data frame:

JSON has very powerful statements for converting to and from JSON for storing into / from SQL Server engine (FOR JSON and JSON VALUE, etc). And since it is gaining popularity for data exchange, I was curious to give it a try with R combination.

I will simply convert a system table into array using for json clause.

There’s an R library. There’s always an R library.

Comments closed

Finding Clusters Of Queries Using R

Published 2017-01-09 by Kevin Feasel

Tomaz Kastrun shows how to use R to find clusters of queries which behave similarly:

So the R code said that, there are three clusters generating And I used medians to generate data around it. In addition I have also tested the result with Partitioning around medoids (which is opposite to hierarchical clustering) and the results from both techniques yield clean clusters.

Clustering models can be powerful for discovering commonalities, and that might help you find a number of queries which all behave in some sub-optimal way without having to trawl through every procedure’s code.

Comments closed

Pipelearner

Published 2017-01-06 by Kevin Feasel

Simon Jackson introduces pipelearner, a tool to help with creating machine learning pipelines:

This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Click through for some very interesting examples.

Comments closed

Using RTVS

Published 2017-01-05 by Kevin Feasel

David Eldersveld gives three reasons why you might be interested in R Tools for Visual Studio:

2. Incorporate R projects as part of a broader Visual Studio solution
Many Visual Studio solutions end up being a collection of individual projects. More often than not, these projects are logically joined by virtue of being part of the same business solution, but each one can incorporate different components or languages. For example, you may architect a solution that involves separate projects for loading data with Azure Data Factory, analysis with R, a front-end C# web app, etc. Rather than keep your R code siloed off in a separate solution, unite it with the rest of your code for development and source control.

This is my primary reason. R Studio is still my go-to option, but RTVS is maturing fairly nicely. It still feels slower than R Studio when displaying data on-screen (especially when you’re spitting out a couple hundred lines of text), but that Visual Studio integration will go far. A fourth reason that David does not mention: it generates the really ugly sp_execute_external_script code for SQL Server R Services.

Comments closed

Ten Notes On SparkR

Published 2017-01-03 by Kevin Feasel

Neil Dewar has a notebook with ten important things when migrating from R to SparkR:

Apache Spark Building Blocks. A high-level overview of Spark describes what is available for the R user.
SparkContext, SQLContext, and SparkSession. In Spark 1.x, SparkContext and SQLContext let you access Spark. In Spark 2.x, SparkSession becomes the primary method.
A DataFrame or a data.frame? Spark’s distributed DataFrame is different from R’s local data.frame. Knowing the differences lets you avoid simple mistakes.
Distributed Processing 101. Understanding the mechanics of Big Data processing helps you write efficient code—and not blow up your cluster’s master node.
Function Masking. Like all R libraries, SparkR masks some functions.
Specifying Rows. With Big Data and Spark, you generally select rows in DataFrames differently than in local R data.frames.
Sampling. Sample data in the right way, and use it as a tool for converting between big and small data.
Machine Learning. SparkR has a growing library of distributed ML algorithms.
Visualization.It can be hard to visualize big data, but there are tricks and tools which help.
Understanding Error Messages. For R users, Spark error messages can be daunting. Knowing how to parse them helps you find the relevant parts.

I highly recommend checking out the notebook.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31