Press "Enter" to skip to content

Category: R

Animating Visuals In R

Tomaz Kastrun shows how to create animated charts in R using ggplot2:

In addition to R code, the ImageMagic program needs to be installed on your machine, as well. Also the speed, quality and many other parameters can be set, when creating animated gif.

Animated gif can be also included into your SSRS report, your Sharepoint site or any other site – like my blog 🙂 and it will stay interactive. In Power BI, importing animated gif as a picture, unfortunately will not work.

Be very careful with this, as not everything supports animated GIFs and you can make some really painful graphs if you try hard enough…

Comments closed

R Tools For Visual Studio

Matt Willis has a two-parter on R Tools for Visual Studio.  First, an introduction:

Once all the prerequisites have been installed it is time to move onto the fun stuff! Open up Visual Studio 2015 and add an R Project: File > Add > New Project and select R. You will be presented with the screen below, name the project AutomobileRegression and select OK.

Microsoft have done a fantastic job realising that the settings and toolbar required in R is very different to those required when using Visual Studio, so they have split them out and made it very easy to switch between the two. To switch to the settings designed for using R go to R Tools > Data Science Settings you’ll be presented with two pop ups select Yes on both to proceed. This will now allow you to use all those nifty shortcuts you have learnt to use in RStudio. Anytime you want to go back to the original settings you can do so by going to Tools > Import/Export Settings.

Next is executing an Azure Machine Learning web service within RTVS:

Whilst in R you can implement very complex Machine Learning algorithms, for anyone new to Machine Learning I personally believe Azure Machine Learning is a more suitable tool for being introduced to the concepts.

Please refer to this blog where I have described how to create the Azure Machine Learning web service I will be using in the next section of this blog. You can either use your own web service or follow my other blog, which has been especially written to allow you to follow along with this blog.

Coming back to RTVS we want to execute the web service we have created.

RTVS has grown on me.  It’s still not R Studio and may never be, but they’ve come a long way in a few months.

Comments closed

Subset And Apply Problems

Tom Martens explains a class of generic data processing problems:

Subset and Apply means that I have a dataset of some rows where due to some conditions all the rows have to be put into a bucket and then a function has to be applied to each bucket.

The simple problem can be solved by a GROUP BY using T-SQL, the not so simple problem requires that all columns and rows of the dataset have to be retained for further processing, even if these columns are not used to subset or bucket the rows in your dataset.

One quick example of this is running totals of orders for each customer, which Tom answers using T-SQL, R, and Power BI.  Click through for those three solutions.

Comments closed

Parsing JSON In R

Tomaz Kastrun shows how to feed a JSON data set into R and turn that into a proper data frame:

JSON has very powerful statements for converting to and from JSON for storing into / from SQL Server engine (FOR JSON and JSON VALUE, etc).  And since it is gaining popularity for data exchange, I was curious to give it a try with R combination.

I will simply convert a system table into array using for json clause.

There’s an R library.  There’s always an R library.

Comments closed

Finding Clusters Of Queries Using R

Tomaz Kastrun shows how to use R to find clusters of queries which behave similarly:

So the R code said that, there are three clusters generating And I used medians to generate data around it. In addition I have also tested the result with Partitioning around medoids (which is opposite to hierarchical clustering) and the results from both techniques yield clean clusters.

Clustering models can be powerful for discovering commonalities, and that might help you find a number of queries which all behave in some sub-optimal way without having to trawl through every procedure’s code.

Comments closed

Pipelearner

Simon Jackson introduces pipelearner, a tool to help with creating machine learning pipelines:

This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Click through for some very interesting examples.

Comments closed

Using RTVS

David Eldersveld gives three reasons why you might be interested in R Tools for Visual Studio:

2. Incorporate R projects as part of a broader Visual Studio solution
Many Visual Studio solutions end up being a collection of individual projects. More often than not, these projects are logically joined by virtue of being part of the same business solution, but each one can incorporate different components or languages. For example, you may architect a solution that involves separate projects for loading data­­ with Azure Data Factory, analysis with R, a front-end C# web app, etc. Rather than keep your R code siloed off in a separate solution, unite it with the rest of your code for development and source control.

This is my primary reason.  R Studio is still my go-to option, but RTVS is maturing fairly nicely.  It still feels slower than R Studio when displaying data on-screen (especially when you’re spitting out a couple hundred lines of text), but that Visual Studio integration will go far.  A fourth reason that David does not mention:  it generates the really ugly sp_execute_external_script code for SQL Server R Services.

Comments closed

Ten Notes On SparkR

Neil Dewar has a notebook with ten important things when migrating from R to SparkR:

  1. Apache Spark Building Blocks. A high-level overview of Spark describes what is available for the R user.

  2. SparkContext, SQLContext, and SparkSession. In Spark 1.x, SparkContext and SQLContext let you access Spark. In Spark 2.x, SparkSession becomes the primary method.

  3. A DataFrame or a data.frame? Spark’s distributed DataFrame is different from R’s local data.frame. Knowing the differences lets you avoid simple mistakes.

  4. Distributed Processing 101. Understanding the mechanics of Big Data processing helps you write efficient code—and not blow up your cluster’s master node.

  5. Function Masking. Like all R libraries, SparkR masks some functions.

  6. Specifying Rows. With Big Data and Spark, you generally select rows in DataFrames differently than in local R data.frames.

  7. Sampling. Sample data in the right way, and use it as a tool for converting between big and small data.

  8. Machine Learning. SparkR has a growing library of distributed ML algorithms.

  9. Visualization.It can be hard to visualize big data, but there are tricks and tools which help.

  10. Understanding Error Messages. For R users, Spark error messages can be daunting. Knowing how to parse them helps you find the relevant parts.

I highly recommend checking out the notebook.

Comments closed

Forecasting Restaurant Inspection Failures

David Smith writes about an R model which predicts which restaurants are more likely to fail inspection:

Chicago’s Department of Public Health used the R language to build and deploy the model, and made the code available as an open source project on GitHub. The reasons given are twofold:

An open source approach helps build a foundation for other models attempting to forecast violations at food establishments. The analytic code is written in R, an open source, widely-known programming language for statisticians. There is no need for expensive software licenses to view and run this code.

Read on for more details and check out their GitHub repo.

Comments closed

7 Visualizations In R

Dikesh Jariwala provides sample R code for seven common visualizations:

In your day-to-day activities, you’ll come across the below listed 7 charts most of the time.

  1. Scatter Plot
  2. Histogram
  3. Bar & Stack Bar Chart
  4. Box Plot
  5. Area Chart
  6. Heat Map
  7. Correlogram

We’ll use ‘Big Mart data’ example as shown below to understand how to create visualizations in R. You can download the full dataset from here.

That’s a nice set of visuals, covering a broad swath of potential visualization scenarios.

Comments closed