Press "Enter" to skip to content

Category: R

Matching Order In R

John Mount shows off the match_order function in wrapr:

Often we wish to work with such data aligned so each row in d2 has the same idx value as the same row (by row order) as d1. This is an important data wrangling task, so there are many ways to achieve it in R, such as base::merge()dplyr::left_join(), or by sorting both tables into the same order and then using base::cbind().

However if you wish to preserve the order of the first table (which may not be sorted), you need one more trick.

Click through to see that one additional trick.

Comments closed

Faceting With R And SQL Server ML Services

Marlon Ribunal has a quick example showing how to build faceted plots with SQL Server ML Services and ggplot2:

In my previous post, I have demonstrated how easy it is to create a bar graph in SQL Server 2017 In-Database Machine Learning using  R.

We’re going to build upon that basic graph.

Sometimes doing data analysis would require us to look at an overview of our data across specific partitions, say a year. For example, we want to see how our product groups fare on month-to-month basis across the last 4 years.

In a data analytics perspective, there are quite a handful of data points in this requirement – data aggregate (quantity), monthly periods, and year partitions.

One of the approaches to handle such requirement is by using a facet. Faceting is a way of plotting subsets of data into a matrix of panels based on one or more variables – or facets.

Click through for the example and code.  Facets are quite useful, but they run the risk of misleading if you squeeze too many onto the screen.  The same line can look quite different with a “tall” facet versus a “wide” facet, and that can change how people interpret your visual.

Comments closed

Building Forest Plots With ggplot2

Faisal Atakora shows how to build a forest plot using ggplot2:

To build a Forest Plot often the forestplot package is used in R. However, I find the ggplot2 to have more advantages in making Forest Plots, such as enable inclusion of several variables with many categories in a lattice form. You can also use any scale of your choice such as log scale etc. In this post, I will introduce how to plot Risk Ratios and their Confidence Intervals of several conditions.

Click through for the script.  You might also want to compare it to the forestplot package to see how these differ.

Comments closed

Accessing BigQuery Data From Python And R

Eleni Markou shows how to connect to Google’s BigQuery service using Python and then R:

Some time ago we discussed how you can access data that are stored in Amazon Redshift and PostgreSQL with Python and R. Let’s say you did find an easy way to store a pile of data in your BigQuery data warehouse and keep them in sync. Now you want to start messing with it using statistical techniques, maybe build a model of your customers’ behavior, or try to predict your churn rate.

To do that, you will need to extract your data from BigQuery and use a framework or language that is best suited for data analysis and the most popular so far are Python and R. In this small tutorial we will see how we can extract data that is stored in Google BigQuery to load it with Python or R, and then use the numerous analytic libraries and algorithms that exist for these two languages.

Read on to see how easy it is for either language.

Comments closed

Learning About Spatial Data In R

Steph Locke has a compendium of resources for people wishing to learn more about working with spatial data in R:

I recently met up with someone who does geospatial stuff but uses the more traditional GIS software to do it. I showed him a few things in R but not being a person who does a lot of geospatial analysis I thought I’d ask the lovely #rspatial crowd what they’d recommend. Here are the compiled recommendations. Happy learning spatial R!

Feel free to comment or tweet your recommendations to get them added to this list.

There’s a lot of reading, watching, and doing there, so thanks to Steph for putting it together.

Comments closed

Visualizing Logistic Regression In Action

Sebastian Sauer shows using ggplot2 visuals what happens when there are interaction effects in a logistic regression:

Of course, probabilities greater 1 do not make sense. That’s the reason why we prefer a “bended” graph, such as the s-type ogive in logistic regression. Let’s plot that instead.

First, we need to get the survival probabilities:

d %>% 
  mutate(pred_prob = predict(glm1, type = "response")) -> d

Notice that type = "response gives you the probabilities of survival (ie., of the modeled event).

Read the whole thing.

Comments closed

Line Drawing And The Traveling Salesman Problem

Antonio Sanchez Chinchon builds a shortest-path portrait generator:

In this experiment I apply an heuristic algorithm to solve the TSP to draw a portrait. The idea is pretty simple:

  • Load a photo

  • Convert it to black and white

  • Choose a sample of black points

  • Solve the TSP to calculate a route among the points

  • Plot the route

Click through for the code.  This is an interesting application of the traveling salesman problem.

Comments closed

Plotting From SQL Server Machine Learning Services

Marlon Ribunal has a quick demo showing how to generate a ggplot2 plot using SQL Server Machine Learning Services:

Let’s install the package if it hasn’t been installed yet. The easiest way to do that is to run RGUI.exe that came with your SQL Server 2017 In-Database Machine Learning installation. You can find it here:

C:\Program Files\MSSQL14.MSSQLSERVER\R_SERVICES\bin\x64

Take note that you need to run the executable as Administrator. Also, if you’ve installed the R engine prior to your SQL Server 2017 In-Database Machine Learning with R, you have to explicitly tell the R package installer where you want your package installed.

> install.packages("ggplot2", lib="C:\\Program Files\\Microsoft SQL Server\\MSSQL14.MSSQLSERVER\\R_SERVICES\\library", dep = TRUE)

dep = TRUE tells the installer to install dependencies. ggplot2 depends on a lot of other packages. You can check dependencies using MiniCRAN.

Another option for installation is to bootstrap install via T-SQL:  you can execute external scripts which run install.packages() directly rather than using RGUI, if that makes more sense with your deployment process.

Comments closed

Visualizing Geo-Spatial Data In R

Carson Sievert shows off the plotly library:

You might be wondering, “What can plotly offer over other interactive mapping packages such as leafletmapviewmapedit, etc?”. One big feature is the linked brushing framework, which works best when linking plotly together with other plotly graphs (i.e., only a subset of brushing features are supported when linking to other crosstalk-compatible htmlwidgets). Another is the ability to leverage the plotly.js API to make efficient updates in shiny apps via plotlyProxy(). Speaking of efficiency, plotly.js keeps on improving the performance of their WebGL-based rendering, so I recommend trying plot_ly() (with toWebGL()) and/or plot_mapbox() if you have lots of graphical elements to render. Also, by having a consistent interface between these various mapping approaches, it’s much quicker and easier to switch from one approach to another when you need to leverage a different set of strengths and weaknesses.

Plotly’s on my list of things I’ll eventually get to one of these days.  H/T R-Bloggers

Comments closed

Using Python Within R

David Smith points out new reticulate package:

With reticulate, you can:

  • Import objects from Python, automatically converted into their equivalent R types. (For example, Pandas data frames become R data.frame objects, and NumPy arrays become R matrix objects.)

  • Import Python modules, and call their functions from R

  • Source Python scripts from R

  • Interactively run Python commands from the R command line

  • Combine R code and Python code (and output) in R Markdown documents, as shown in the snippet below

The first thing that came to mind when reading this was the implementation of the keras package in R and how it calls out to TensorFlow (written in Python).  The ability to make R vs Python an “and” instead of an “or” proposition is quite powerful.

Comments closed