Press "Enter" to skip to content

Category: R

Building Forest Plots With ggplot2

Faisal Atakora shows how to build a forest plot using ggplot2:

To build a Forest Plot often the forestplot package is used in R. However, I find the ggplot2 to have more advantages in making Forest Plots, such as enable inclusion of several variables with many categories in a lattice form. You can also use any scale of your choice such as log scale etc. In this post, I will introduce how to plot Risk Ratios and their Confidence Intervals of several conditions.

Click through for the script.  You might also want to compare it to the forestplot package to see how these differ.

Comments closed

Accessing BigQuery Data From Python And R

Eleni Markou shows how to connect to Google’s BigQuery service using Python and then R:

Some time ago we discussed how you can access data that are stored in Amazon Redshift and PostgreSQL with Python and R. Let’s say you did find an easy way to store a pile of data in your BigQuery data warehouse and keep them in sync. Now you want to start messing with it using statistical techniques, maybe build a model of your customers’ behavior, or try to predict your churn rate.

To do that, you will need to extract your data from BigQuery and use a framework or language that is best suited for data analysis and the most popular so far are Python and R. In this small tutorial we will see how we can extract data that is stored in Google BigQuery to load it with Python or R, and then use the numerous analytic libraries and algorithms that exist for these two languages.

Read on to see how easy it is for either language.

Comments closed

Learning About Spatial Data In R

Steph Locke has a compendium of resources for people wishing to learn more about working with spatial data in R:

I recently met up with someone who does geospatial stuff but uses the more traditional GIS software to do it. I showed him a few things in R but not being a person who does a lot of geospatial analysis I thought I’d ask the lovely #rspatial crowd what they’d recommend. Here are the compiled recommendations. Happy learning spatial R!

Feel free to comment or tweet your recommendations to get them added to this list.

There’s a lot of reading, watching, and doing there, so thanks to Steph for putting it together.

Comments closed

Visualizing Logistic Regression In Action

Sebastian Sauer shows using ggplot2 visuals what happens when there are interaction effects in a logistic regression:

Of course, probabilities greater 1 do not make sense. That’s the reason why we prefer a “bended” graph, such as the s-type ogive in logistic regression. Let’s plot that instead.

First, we need to get the survival probabilities:

d %>% 
  mutate(pred_prob = predict(glm1, type = "response")) -> d

Notice that type = "response gives you the probabilities of survival (ie., of the modeled event).

Read the whole thing.

Comments closed

Line Drawing And The Traveling Salesman Problem

Antonio Sanchez Chinchon builds a shortest-path portrait generator:

In this experiment I apply an heuristic algorithm to solve the TSP to draw a portrait. The idea is pretty simple:

  • Load a photo

  • Convert it to black and white

  • Choose a sample of black points

  • Solve the TSP to calculate a route among the points

  • Plot the route

Click through for the code.  This is an interesting application of the traveling salesman problem.

Comments closed

Plotting From SQL Server Machine Learning Services

Marlon Ribunal has a quick demo showing how to generate a ggplot2 plot using SQL Server Machine Learning Services:

Let’s install the package if it hasn’t been installed yet. The easiest way to do that is to run RGUI.exe that came with your SQL Server 2017 In-Database Machine Learning installation. You can find it here:

C:\Program Files\MSSQL14.MSSQLSERVER\R_SERVICES\bin\x64

Take note that you need to run the executable as Administrator. Also, if you’ve installed the R engine prior to your SQL Server 2017 In-Database Machine Learning with R, you have to explicitly tell the R package installer where you want your package installed.

> install.packages("ggplot2", lib="C:\\Program Files\\Microsoft SQL Server\\MSSQL14.MSSQLSERVER\\R_SERVICES\\library", dep = TRUE)

dep = TRUE tells the installer to install dependencies. ggplot2 depends on a lot of other packages. You can check dependencies using MiniCRAN.

Another option for installation is to bootstrap install via T-SQL:  you can execute external scripts which run install.packages() directly rather than using RGUI, if that makes more sense with your deployment process.

Comments closed

Visualizing Geo-Spatial Data In R

Carson Sievert shows off the plotly library:

You might be wondering, “What can plotly offer over other interactive mapping packages such as leafletmapviewmapedit, etc?”. One big feature is the linked brushing framework, which works best when linking plotly together with other plotly graphs (i.e., only a subset of brushing features are supported when linking to other crosstalk-compatible htmlwidgets). Another is the ability to leverage the plotly.js API to make efficient updates in shiny apps via plotlyProxy(). Speaking of efficiency, plotly.js keeps on improving the performance of their WebGL-based rendering, so I recommend trying plot_ly() (with toWebGL()) and/or plot_mapbox() if you have lots of graphical elements to render. Also, by having a consistent interface between these various mapping approaches, it’s much quicker and easier to switch from one approach to another when you need to leverage a different set of strengths and weaknesses.

Plotly’s on my list of things I’ll eventually get to one of these days.  H/T R-Bloggers

Comments closed

Using Python Within R

David Smith points out new reticulate package:

With reticulate, you can:

  • Import objects from Python, automatically converted into their equivalent R types. (For example, Pandas data frames become R data.frame objects, and NumPy arrays become R matrix objects.)

  • Import Python modules, and call their functions from R

  • Source Python scripts from R

  • Interactively run Python commands from the R command line

  • Combine R code and Python code (and output) in R Markdown documents, as shown in the snippet below

The first thing that came to mind when reading this was the implementation of the keras package in R and how it calls out to TensorFlow (written in Python).  The ability to make R vs Python an “and” instead of an “or” proposition is quite powerful.

Comments closed

Working With forcats

S. Richter-Walsh demonstrates what the forcats R package can do:

Synonymous factor levels

Sometimes a categorical variable may have two or more factor levels that refer to the same group. There may be subtle differences in syntax such as upper case leading letter versus lower case leading letter (GroupA vs. groupA), for example. In this situation, one can use forcats::fct_collapse() to collapse the synonymous levels into one. In our test data, let’s assume that Web and Online refer to the same sales channel and we want to combine both into a factor level called Online….

df$sales <- fct_collapse(df$sales, Online = c("Online", "Web"))

I don’t use forcats that often, but when I do, I definitely appreciate it being here.  H/T R-Bloggers

Comments closed

Plotting In R Using ggplot2

The folks at Sharp Sight Labs have another nice demo of ggplot2:

You’ve heard me say it a thousand times: to master data science, you need to practice.

You need to “practice small” by practicing individual techniques and functions. But you also need to “practice big” by working on larger projects.

To get some practice, my recommendation is to find reasonably sized datasets online and plot them.

Wikipedia is a nearly-endless source of good datasets. The great thing about Wikipedia is that many of the datasets are small and well contained. They are also fairly clean, with just enough messiness to make them a bit of a challenge.

As a quick example, this week, we’ll plot some economic data.

The code is deceptively easy considering the scope of the problem.

Comments closed