Category: R

Building A Gantt Chart With Plotly

Published 2018-11-27 by Kevin Feasel

Ellen Talbot shows us how to embrace our inner micromanagers:

Something a little different today for a quick chat about my latest project and why I’m finding the plotly package so helpful!

Are you like me and physically can’t function unless you’ve got a to do list in front of you? Well even if you’re not, imagine my pain while I’m wearing my non – Locke Data hat and trying to plan out the final year of my PhD thesis!

I needed something that updated easily, something visual and something to keep my supervisors in the know. I’ve previously made gantt charts using LaTeX but found it ridiculously clunky to get working and decided there had to be a better way. And if I could include interactivity then all the better, which is how I discovered plotly.

Admittedly, I like gantt charts more than almost any developer I’ve ever met. They always look so pretty and are wonderful depictions of a world which will never be.

Comments closed

Working With Strings In Base R

Published 2018-11-26 by Kevin Feasel

Jozef Hajnala shows us that you don’t need stringr to do cool things with strings in R:

This post is aimed to serve as an overview of functionality provided by base R to work with strings. Note that the term “string” is used somewhat loosely and refers to character vectors and character strings. In R documentation, references to character string, refer to character vectors of length 1.

Also since this is an overview, we will not examine the details of the functions, but rather list examples with simple, intuitive explanations trading off technical precision.

As much as I like the tidyverse for its data platform professional-friendly approach to R, it is good to know the base libraries (and other alternatives) as well. H/T R-Bloggers

Comments closed

Quick Geospatial Data Plots In R And Python

Published 2018-11-26 by Kevin Feasel

Harry McLellan shows us how we can use R and Python to generate quick-and-dirty plots of geospatial data:

Now R has some useful packages like ggmap, mapdata and ggplot2 which allow you to source you map satellite images directly from google maps, but this does require a free google API key to source from the cloud. These packages can also plot the map around the data as I am currently trimming the map to fit the data. But for a fair test I also used a simplistic pre-built map in R. This was from the package rworldmap, which allows plotting at a country level with defined borders. Axes can be scaled to act like a zoom function but without a higher resolutions map or raster satellite image map it is pointless to go past a country level.

There’s a lot more you can do with both languages, but when you just want a plot in a few lines of code, both are up to the task.

Comments closed

Dealing With Zero-Value Rows In dplyr

Published 2018-11-21 by Kevin Feasel

Kieran Healy shows an oddity in dplyr when dealing with zero-value records:

That looks fine. You can see in each panel the 2015 column is 100% Men. If we were working on this a bit longer we’d polish up the x-axis so that the dates were centered under the columns. But as an exploratory plot it’s fine.

But let’s say that, instead of a column plot, you looked at a line plot instead. This would be a natural thing to do given that time is on the x-axis and so you’re looking at a trend, albeit one over a small number of years.

This is behavior I hadn’t run into, and it does seem a bit odd. On a totally unrelated note, Healy’s Data Visualization: A Practical Introduction is one of the best books on the topic.

Comments closed

Running R Scripts In Power BI’s Query Editor

Published 2018-11-20 by Kevin Feasel

Brad Lewellyn walks us through the process of executing an R script against a table in Power Query:

If you aren’t able to open the R Script Editor, check out our previous post, Getting Started with R Scripts. While it’s possible to develop and test code using the built-in R Script Editor, it’s not great. Unfortunately, there doesn’t seem to be a way to develop this script using an external IDE like RStudio. So, we typically export files to csv for development in RStudio. This is obviously not optimal and should be done with caution when data is extremely large or sensitive in some way. Fortunately, the write.csv() function is pretty easy to use. You can read more about it here.

It’s not a perfect experience, but Brad does show us how to get it done.

Comments closed

The Lesser-Known Apply Functions In R

Published 2018-11-14 by Kevin Feasel

Andrew Treadway covers a few of the lesser-known apply functions in R:

rapply

Let’s start with rapply. This function has a couple of different purposes. One is to recursively apply a function to a list. We’ll get to that in a moment. The other use of rapply is to a apply a function to only those elements in a list (or columns in a data frame) that belong to a specified class. For example, let’s say we have a data frame with a mix of categorical and numeric variables, but we want to evaluate a function only on the numeric variables.

Click through for some examples of rapply as well as vapply and eapply. I’ve used rapply to get cardinality of each feature in a data frame but the other two are new to me. H/T R-bloggers

Comments closed

Controlling Azure Services In R With AzureR

Published 2018-11-12 by Kevin Feasel

Hong Ooi announces a new set of packages called AzureR:

As background, some of you may remember the AzureSMR package, which was written a few years back as an R interface to Azure. AzureSMR was very successful and gained a significant number of users, but it was never meant to be maintainable in the long term. As more features were added it became more unwieldy until its design limitations became impossible to ignore.

The AzureR family is a refactoring/rewrite of AzureSMR that aims to fix the earlier package’s shortcomings.

The core package of the family is AzureRMR, which provides a lightweight yet powerful interface to Azure Resource Manager. It handles authentication (including automatically renewing when a session token expires), managing resource groups, and working with individual resources and templates. It also calls the Resource Manager REST API directly, so you don’t need to have PowerShell or Python installed; it depends only on commonly used R packages like httr, jsonlite and R6.

This won’t replace the Powershell libraries, but looks like it’d be useful for scenarios like if you need to set up a VM, train a model, and then shut down the VM.

Comments closed

Explaining Neural Networks With H2O

Published 2018-11-07 by Kevin Feasel

Shirin Glander explains some of the concepts behind neural networks using H2O as a guide:

Before, when describing the simple perceptron, I said that a result is calculated in a neuron, e.g. by summing up all the incoming data multiplied by weights. However, this has one big disadvantage: such an approach would only enable our neural net to learn linearrelationships between data. In order to be able to learn (you can also say approximate) any mathematical problem – no matter how complex – we use activation functions. Activation functions normalize the output of a neuron, e.g. to values between -1 and 1, (Tanh), 0 and 1 (Sigmoid) or by setting negative values to 0 (Rectified Linear Units, ReLU). In H2O we can choose between Tanh, Tanh with Dropout, Rectifier (default), Rectifier with Dropout, Maxout and Maxout with Dropout. Let’s choose Rectifier with Dropout. Dropout is used to improve the generalizability of neural nets by randomly setting a given proportion of nodes to 0. The dropout rate in H2O is specified with two arguments: hidden_dropout_ratios, which per default sets 50% of hidden (more on that in a minute) nodes to 0. Here, I want to reduce that proportion to 20% but let’s talk about hidden layers and hidden nodes first. In addition to hidden dropout, H2O let’s us specify a dropout for the input layer with input_dropout_ratio. This argument is deactivated by default and this is how we will leave it.

Read the whole thing and, if you understand German, check out the video as well.

Comments closed

Detecting Redirects With httr

Published 2018-11-07 by Kevin Feasel

Peter Meissner shows us how we can find redirects when using the httr package:

I am the creator and maintainer of the robotstxt package an R package that enables users to retrieve and parse robots.txt files and ultimately is designed to do access permission checking for web resources.

Recently a discussion came up about how to interpret permissions in case of sub-domains and HTTP redirects. Long story short: In case of robots.txt files redirects are suspicious and users should at least be informed about it happening so they might take appropriate action.

So, I set out to find a way to check whether or not a robots.txt files requested via the httr package has gone through one or more redirects prior to its retrieval.

Click through for the solution.

Comments closed

Coalesce In R With wrapr

Published 2018-11-06 by Kevin Feasel

John Mount shows off an infix operator for coalescing data in R:

coalesce is a classic useful SQL operator that picks the first non-NULLvalue in a sequence of values.

We thought we would share a nice version of it for picking non-NA R with convenient operator infix notation wrapr::coalesce().

Click through for an example.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30