Press "Enter" to skip to content

Category: R

Visualizing Earthquake Data

Giorgio Garziano continues a series on analyzing earthquake data:

This is the third part of our post series about the exploratory analysis of a publicly available dataset reporting earthquakes and similar events within a specific 30 days time span. In this post, we are going to show static, interactive and animated earthquakes maps of different flavors by using the functionalities provided by a pool of R packages as specifically explained herein below.

Giorgio looks at 9 separate R mapping packages, so you get your money’s worth here.

Comments closed

Changes in R 3.6.0

David Smith lays out the major changes in R 3.6.0:

A major update to the open-source R language, R 3.6.0, was released on April 26 and is now available for download for Windows, Mac and Linux. As a major update, it has many new features, user-visible changes and bug fixes. You can read the details in the release announcement, and in this blog post I’ll highlight the most significant ones.

There are some good changes in here.

Comments closed

Horizontal Labels with ggplot

Michael Toth shows us how to ensure we use horizontal text labels in ggplot:

There are several things we could do to improve this graph, but in this guide let’s focus on rotating the y-axis label. This simple change will make your graph so much better. That way, people won’t have to tilt their heads like me to understand what’s going on in your graph:

It may not seem like much when you’re creating the visual, but it can make a difference for a viewer.

Comments closed

Repeated Cross-Validation in R

Ludvig Olsen walks us through a couple of nice R packages:

The benefits of using groupdata2 to create the folds are 1) that it allows us to balance the ratios of our output classes (or simply a categorical column, if we are working with linear regression instead of classification), and 2) that it allows us to keep all observations with a specific ID (e.g. participant/user ID) in the same fold to avoid leakage between the folds.

The benefit of cvms is that it trains all the models and outputs a tibble (data frame) with results, predictions, model coefficients, and other sweet stuff, which is easy to add to a report or do further analyses on. It even allows us to cross-validate multiple model formulas at once to quickly compare them and select the best model.

Ludvig also gives us some examples of how both packages can help you out. H/T R-Bloggers

Comments closed

Microsoft R Open 3.5.2 and 3.5.3

David Smith announces Microsoft R Open 3.5.2 and reveals when 3.5.3 comes out:

It’s taken a little bit longer than usual, but Microsoft R Open 3.5.2 (MRO) is now available for download for Windows and Linux. This update is based on R 3.5.2, and accordingly fixes a few minor bugs compared to MRO 3.5.1. The main change you will note is that new CRAN packages released since R 3.5.1 can now be used with this version of MRO.

David also lets us know that they’re working on 3.6.0’s release.

Comments closed

Exploratory Data Analysis on Categorical Variables

Giorgio Garziano continues digging into earthquake data:

To understand relationship or dependencies among categorical variables, we take advantage of various types of tables and graphical methods. Also stratifying variables can be encompassed in order to highlight if the relationship between two primary variables is the same or different for all levels of the stratifying variable under consideration.

The contingency table are said to be of one-way flavor when involving just one categorical variable. They are said two-way when involving two categorical variables, and so on (N-way).

Read on for various techniques for data analysis against categorical variables.

Comments closed

UpSet Plots for Set Analysis

Laura Ellis digs into the UpSetR package:

UpSet plots have a very cool parameter called queries. Queries can be used to define a subset of the data that you would like to highlight in your graph. The queries property takes in a list of query lists which means that you can pass multiple queries into the same graph. Each query list allows you to set a number of properties about how the query should function.

In this example we are viewing the Cycle and Walk set intersection (query and params). We want the query to be highlighted in a nice pink (color). We want to display the query as a highlighted overlap (active) and we will give it a name that we add to the chart legend (query.name)

I’ve not seen an UpSet plot before but it dumps a lot of information into a relatively small space. I’ll have to spend some time learning more about these plots.

Comments closed

AzureGraph: Microsoft Graph in R

Hong Ooi takes us through AzureGraph:

Microsoft Graph is a comprehensive framework for accessing data in various online Microsoft services, including Azure Active Directory (AAD), Office 365, OneDrive, Teams, and more. AzureGraph is an R package that provides a simple R6-based interface to the Graph REST API, and is the companion package to AzureRMR and AzureAuth.

Currently, AzureGraph aims to provide an R interface only to the AAD part, with a view to supporting R interoperability with Azure: registered apps and service principals, users and groups. Like AzureRMR, it could potentially be extended to support other services.

Just to clarify, this is like Facebook Graph API for Azure components, not a graph database that you can store your own data in.

Comments closed

Data Layout in R with cdata

John Mount takes us through a few sample problems and how to reshape data with cdata:

This may seem like a lot of steps, but it is only because we are taking the problems very slowly. The important point is that we want to minimize additional problem solving when applying the cdata methodology. Usually when you need to transform data you are in the middle of some other more important task, so you want to delegate the details of how the layout transform is implemented. With cdata the user is not asked to perform additional puzzle solving to guess a sequence of operators that may implement the desired data layout transform. The cdata solution pattern is always the same, which can help in mastering it.

With cdata, record layout transforms are simple R objects with detailed print() methods- so they are convenient to alter, save, and re-use later. The record layout transform also documents the expected columns and constants of the incoming data.

Check it out.

Comments closed