Controlling Azure Services In R With AzureR

Kevin Feasel

2018-11-12

Cloud, R

Hong Ooi announces a new set of packages called AzureR:

As background, some of you may remember the AzureSMR package, which was written a few years back as an R interface to Azure. AzureSMR was very successful and gained a significant number of users, but it was never meant to be maintainable in the long term. As more features were added it became more unwieldy until its design limitations became impossible to ignore.

The AzureR family is a refactoring/rewrite of AzureSMR that aims to fix the earlier package’s shortcomings.

The core package of the family is AzureRMR, which provides a lightweight yet powerful interface to Azure Resource Manager. It handles authentication (including automatically renewing when a session token expires), managing resource groups, and working with individual resources and templates. It also calls the Resource Manager REST API directly, so you don’t need to have PowerShell or Python installed; it depends only on commonly used R packages like httr, jsonlite and R6.

This won’t replace the Powershell libraries, but looks like it’d be useful for scenarios like if you need to set up a VM, train a model, and then shut down the VM.

Explaining Neural Networks With H2O

Shirin Glander explains some of the concepts behind neural networks using H2O as a guide:

Before, when describing the simple perceptron, I said that a result is calculated in a neuron, e.g. by summing up all the incoming data multiplied by weights. However, this has one big disadvantage: such an approach would only enable our neural net to learn linearrelationships between data. In order to be able to learn (you can also say approximate) any mathematical problem – no matter how complex – we use activation functions. Activation functions normalize the output of a neuron, e.g. to values between -1 and 1, (Tanh), 0 and 1 (Sigmoid) or by setting negative values to 0 (Rectified Linear Units, ReLU). In H2O we can choose between Tanh, Tanh with Dropout, Rectifier (default), Rectifier with Dropout, Maxout and Maxout with Dropout. Let’s choose Rectifier with Dropout. Dropout is used to improve the generalizability of neural nets by randomly setting a given proportion of nodes to 0. The dropout rate in H2O is specified with two arguments: hidden_dropout_ratios, which per default sets 50% of hidden (more on that in a minute) nodes to 0. Here, I want to reduce that proportion to 20% but let’s talk about hidden layers and hidden nodes first. In addition to hidden dropout, H2O let’s us specify a dropout for the input layer with input_dropout_ratio. This argument is deactivated by default and this is how we will leave it.

Read the whole thing and, if you understand German, check out the video as well.

Detecting Redirects With httr

Kevin Feasel

2018-11-07

R

Peter Meissner shows us how we can find redirects when using the httr package:

I am the creator and maintainer of the robotstxt package an R package that enables users to retrieve and parse robots.txt files and ultimately is designed to do access permission checking for web resources.

Recently a discussion came up about how to interpret permissions in case of sub-domains and HTTP redirects. Long story short: In case of robots.txt files redirects are suspicious and users should at least be informed about it happening so they might take appropriate action.

So, I set out to find a way to check whether or not a robots.txt files requested via the httr package has gone through one or more redirects prior to its retrieval.

Click through for the solution.

Coalesce In R With wrapr

Kevin Feasel

2018-11-06

R

John Mount shows off an infix operator for coalescing data in R:

coalesce is a classic useful SQL operator that picks the first non-NULLvalue in a sequence of values.

We thought we would share a nice version of it for picking non-NA R with convenient operator infix notation wrapr::coalesce().

Click through for an example.

Exploratory Data Analysis In R

Laura Ellis walks us through some easy techniques for learning about our data using R:

DIM AND GLIMPSE

Next, we will run the dim function which displays the dimensions of the table. The output takes the form of row, column.

And then we run the glimpse function from the dplyr package. This will display a vertical preview of the dataset. It allows us to easily preview data type and sample data.

Spending some quality time doing EDA can save you in the long run, as it can help you get a feel for things like data quality, the distributions of variables, and completeness of data.

Azure ML Studio Supports R 3.4

David Smith notes that Azure ML Studio now supports R version 3.4:

With the Execute R Script module you can immediately use more than 650 R packages which come preinstalled in the Azure ML Studio environment. You can also use other R packages (including packages not on CRAN) and source in R scripts you develop elsewhere (as shown above), although this does require the time to install them in the Studio environment. You can even create custom ML Studio models encapsulating R code for others to use in the drag-and-drop environment.

If you’re new to Azure ML Studio, check out the Quickstart Tutorial for R to learn how use the Execute R Script module, and to check out what’s new in the latest update follow the link below.

Click through for more details.

Investigating UK Traffic With Principal Component Analysis

Michael Grogan shows us how to use Principal Component Analysis (PCA) to classify route segments in UK transportation data:

Specifically, let us assume that we wish to analyze traffic density for buses and coaches. The main thing we are interested in is the frequency of traffic across a particular route.

Let’s take an example. If buses cover 100 miles on a route that is 5 miles long within a certain timeframe, then the frequency will be greater than 100 miles covered on a route that is 10 miles long over the same time period.

Read on for an interesting example.

Checking Functional Dependencies In R Data Frames

John Mount shows us how to use the psagg function in wrapr to ensure that functional dependencies are valid:

Notice only grouping columns and columns passed through an aggregating calculation (such as max()) are passed through (the column zis not in the result). Now because y is a function of x no substantial aggregation is going on, we call this situation a “pseudo aggregation” and we have taught this before. This is also why we made the seemingly strange choice of keeping the variable name y (instead of picking a new name such as max_y), we expect the y values coming out to be the same as the one coming in- just with changes of length. Pseudo aggregation (using the projection y[[1]]) was also used in the solutions of the column indexing problem.

Our wrapr package now supplies a special case pseudo-aggregator (or in a mathematical sense: projection): psagg(). It works as follows.

In this post, John calls the act of grouping functional dependencies (where we can determine the value of y based on the value of x, for any number of columns in y or x) pseudo-aggregation.

Using datapasta To Paste Spreadsheet Data In R

Kevin Feasel

2018-11-01

R

Mara Averick shows us how we can use datapasta with RStudio to create good representative examples when asking questions:

So, you’ve been asked to make a reprex and you want to include a bit of data that you have in a spreadsheet. Meet {datapasta}, a package by Miles McBain that can make your life a whole lot easier. Once you’ve installed datapasta, you simply copy a selected number of rows from your spreadsheet (remember, this is a minimal reproducible example), and click the Paste as tribble option from the DATAPASTA section of the Addins dropdown

Click through for a demo.

Building Custom R Visuals In Power BI

Kevin Feasel

2018-10-30

Power BI, R

Brad Lewellyn shows us how to create custom R visuals within Power BI:

Over the last few posts, we’ve shown how to use custom R visuals built by others.  Today, we’re going to build our own using the Custom R Visual available in Power BI Desktop.  If you haven’t read the second post in this series, Getting Started with R Scripts, it is highly recommended you do so now, as it provides necessary context for how to link Power BI to your local R ISE.

In the previous post, we created a bunch of log-transformed measures to find good predictors for Revenue.  We’re going to use these same measures today to create a basic linear regression model to predict Revenue.  If you want to follow along, the dataset can be found here.  Here’s the custom DAX we used to create the necessary measures.

Click through for the example.

Categories

December 2018
MTWTFSS
« Nov  
 12
3456789
10111213141516
17181920212223
24252627282930
31