Press "Enter" to skip to content

Category: R

Line Drawing And The Traveling Salesman Problem

Antonio Sanchez Chinchon builds a shortest-path portrait generator:

In this experiment I apply an heuristic algorithm to solve the TSP to draw a portrait. The idea is pretty simple:

  • Load a photo

  • Convert it to black and white

  • Choose a sample of black points

  • Solve the TSP to calculate a route among the points

  • Plot the route

Click through for the code.  This is an interesting application of the traveling salesman problem.

Comments closed

Plotting From SQL Server Machine Learning Services

Marlon Ribunal has a quick demo showing how to generate a ggplot2 plot using SQL Server Machine Learning Services:

Let’s install the package if it hasn’t been installed yet. The easiest way to do that is to run RGUI.exe that came with your SQL Server 2017 In-Database Machine Learning installation. You can find it here:

C:\Program Files\MSSQL14.MSSQLSERVER\R_SERVICES\bin\x64

Take note that you need to run the executable as Administrator. Also, if you’ve installed the R engine prior to your SQL Server 2017 In-Database Machine Learning with R, you have to explicitly tell the R package installer where you want your package installed.

> install.packages("ggplot2", lib="C:\\Program Files\\Microsoft SQL Server\\MSSQL14.MSSQLSERVER\\R_SERVICES\\library", dep = TRUE)

dep = TRUE tells the installer to install dependencies. ggplot2 depends on a lot of other packages. You can check dependencies using MiniCRAN.

Another option for installation is to bootstrap install via T-SQL:  you can execute external scripts which run install.packages() directly rather than using RGUI, if that makes more sense with your deployment process.

Comments closed

Visualizing Geo-Spatial Data In R

Carson Sievert shows off the plotly library:

You might be wondering, “What can plotly offer over other interactive mapping packages such as leafletmapviewmapedit, etc?”. One big feature is the linked brushing framework, which works best when linking plotly together with other plotly graphs (i.e., only a subset of brushing features are supported when linking to other crosstalk-compatible htmlwidgets). Another is the ability to leverage the plotly.js API to make efficient updates in shiny apps via plotlyProxy(). Speaking of efficiency, plotly.js keeps on improving the performance of their WebGL-based rendering, so I recommend trying plot_ly() (with toWebGL()) and/or plot_mapbox() if you have lots of graphical elements to render. Also, by having a consistent interface between these various mapping approaches, it’s much quicker and easier to switch from one approach to another when you need to leverage a different set of strengths and weaknesses.

Plotly’s on my list of things I’ll eventually get to one of these days.  H/T R-Bloggers

Comments closed

Using Python Within R

David Smith points out new reticulate package:

With reticulate, you can:

  • Import objects from Python, automatically converted into their equivalent R types. (For example, Pandas data frames become R data.frame objects, and NumPy arrays become R matrix objects.)

  • Import Python modules, and call their functions from R

  • Source Python scripts from R

  • Interactively run Python commands from the R command line

  • Combine R code and Python code (and output) in R Markdown documents, as shown in the snippet below

The first thing that came to mind when reading this was the implementation of the keras package in R and how it calls out to TensorFlow (written in Python).  The ability to make R vs Python an “and” instead of an “or” proposition is quite powerful.

Comments closed

Working With forcats

S. Richter-Walsh demonstrates what the forcats R package can do:

Synonymous factor levels

Sometimes a categorical variable may have two or more factor levels that refer to the same group. There may be subtle differences in syntax such as upper case leading letter versus lower case leading letter (GroupA vs. groupA), for example. In this situation, one can use forcats::fct_collapse() to collapse the synonymous levels into one. In our test data, let’s assume that Web and Online refer to the same sales channel and we want to combine both into a factor level called Online….

df$sales <- fct_collapse(df$sales, Online = c("Online", "Web"))

I don’t use forcats that often, but when I do, I definitely appreciate it being here.  H/T R-Bloggers

Comments closed

Plotting In R Using ggplot2

The folks at Sharp Sight Labs have another nice demo of ggplot2:

You’ve heard me say it a thousand times: to master data science, you need to practice.

You need to “practice small” by practicing individual techniques and functions. But you also need to “practice big” by working on larger projects.

To get some practice, my recommendation is to find reasonably sized datasets online and plot them.

Wikipedia is a nearly-endless source of good datasets. The great thing about Wikipedia is that many of the datasets are small and well contained. They are also fairly clean, with just enough messiness to make them a bit of a challenge.

As a quick example, this week, we’ll plot some economic data.

The code is deceptively easy considering the scope of the problem.

Comments closed

Why Does Empirical Variance Use n-1 Instead Of n?

Sebastian Sauer gives us a simulation showing why we use n-1 instead of n as the denominator when calculating the variance of a sample:

Our results show that the variance of the sample is smaller than the empirical variance; however even the empirical variance too is a little too small compared with the population variance (which is 1). Note that sample size was n=10 in each draw of the simulation. With sample size increasing, both should get closer to the “real” (population) sample size (although the bias is negligible for the empirical variance). Let’s check that.

This is an R-heavy post and does a great job of showing that it’s necessary, and ends with  recommended reading if you want to understand the why.

Comments closed

ggplot2 Geoms And Aesthetics

Tyler Rinker digs into ggplot2’s geoms and aesthetics:

I thought it my be fun to use the geoms aesthetics to see if we could cluster aesthetically similar geoms closer together. The heatmap below uses cosine similarity and heirarchical clustering to reorder the matrix that will allow for like geoms to be found closer to one another (note that today I learned from “R for Data Science” about the seriation package [https://cran.r-project.org/web/packages/seriation/index.html] that may make this matrix reordering task much easier).

It’s an interesting analysis of what’s available within ggplot2 and a detailed look at how different geoms fit together with respect to aesthetic options.

Comments closed

Legible Function Chaining In R

John Mount shows a few techniques for legible function chaining with R:

The dot intermediate convention is very succinct, and we can use it with base R transforms to get a correct (and performant) result. Like all conventions: it is just a matter of teaching, learning, and repetition to make this seem natural, familiar and legible.

My preference is to use dplyr + magrittr because I really do like that pipe operator.  John’s point is well-taken, however:  you don’t need to use the tidyverse to write clean R code, and there can be value in using the base functionality.

Comments closed