Press "Enter" to skip to content

Category: R

Solving the Spelling Bee Honeycomb Puzzle

David Robinson has fun with puzzle-solving:

Solving this puzzle in R is interesting enough, but it’s particularly challenging to do so in a computationally efficient way. As much as I love the tidyverse, this, like the “lost boarding pass” puzzle and Emily Robinson’s evaluation of the best Pokémon team, serves as a great example of using R’s matrix operations to work efficiently with data.

I’ve done a lot of puzzles recently, and I realized that showing the end result isn’t a representation of my thought process. I don’t show all the dead ends and bugs, or explain why I ended up choosing a particular path. So in the same spirit as my Tidy Tuesday screencasts, I recorded myself solving this puzzle (though not the process of turning it into a blog post).

Most of the post is analysis around the problem, but you do get a viable solution as well.

Comments closed

Solving Sudoku with R

Tomaz Kastrun builds a validation function for Sudoku:

Function validater will validate for the sudoku board a particular solution at a particular position:

validater(sudoku, 1, c(1,4))

In matrix, at position x=1, y=4, where there is 0, it will test if number 1 is valid or not. If the number is valid, it returns TRUE (number) to outer function for finding complete solution.

This function iterates through all the possible 0-positions and iterates through solutions that are still available based on the rules:

Click through for that validation function.

Comments closed

Working with R and the Windows Command Line

Tomaz Kastrun takes us through calling CMD commands from R:

From time to time, when developing in R, working and wrangling data , preparing for machine learning projects, it comes the time, one would still need to access the operating system commands from/in R.

In this blog post, let’s take a look at some most useful cmd commands when using R.  Please note, that the cmd commands apply only to windows environment, for Linux/MacOS, the system commands should be slightly changed, but the wrapper R code should remains the same.

The need does come up, so it’s good to have that knowledge at hand.

Comments closed

Mapping in R with mapply

Andrew Treadway shows how to use the Map() and mapply() functions in R:

An older post on this blog talked about several alternative base apply functions. This post will talk about how to apply a function across multiple vectors or lists with Map and mapply in R. These functions are generalizations of sapply and lapply, which allow you to more easily loop over multiple vectors or lists simultaneously.

The idea of Map in functional programming takes a bit of time to really wrap your head around, but once you do, it becomes extremely powerful. H/T R-bloggers

Comments closed

Finding the Largest Profit or Loss with R

David Robinson takes us through a brainteaser:

I recently came across an interview problem from A Cool SQL Problem: Avoiding For-Loops . Avoiding loops is a topic I always enjoy reading about, and the blog post didn’t disappoint. I’ll quote that post’s description of the problem:

You have a table of trading days (with no gaps) and close prices for a stock.

Find the highest and lowest profits (or losses) you could have made if you bought the stock at one close price and sold it at another close price, i.e. a total of exactly two transactions.

You cannot sell a stock before it has been purchased. Your solution can allow buying and selling on the same trading_date (i.e. profit or loss of $0 is always, by definition, an available option); however, for some bonus points, you may write a more general solution for this problem that requires you to hold the stock for at least N days.

The SQL solution is another scenario where window functions save the day. In R, there are a couple straightforward options (if you happen to know about the functions!) which have radically different performance profiles.

Comments closed

Data Visualization in R and Python

Michelle Golchert contrasts libraries for visualizing data in R and Python:

Unlike R, Python – as a “general-purpose” programming language – does not include data visualization tools by default. However, Python also provides many libraries for this purpose, such as Matplotlib and Seaborn.

Python now also offers numerous packages (like plotnine and ggpy) which are equivalents of ggplot2 in R, and allow you to create plots in Python according to the same “Grammar of Graphics” principle.

This is an area where I think R has the upper hand at most levels: it’s easier to get started plotting with R (thanks to the built-in plots), it’s easier to do “intermediate-quality” plots (stuff you would use in an internal presentation), and you tend to have more control when building professional-quality plots. You can certainly create beautiful visuals in both languages, though.

Comments closed

A Preview of R 4.0

David Smith takes a look at what’s coming in R 4.0:

Normalization of matrix and array types. Conceptually, a matrix is just a 2-dimensional array. But current versions of R handle matrix and 2-D array objects differently in some cases. In R 4.0.0, matrix objects will formally inherit from the array class, eliminating such inconsistencies.

A refreshed color palette for charts. The base graphics palette for current versions of R (shown as R3 below) features saturated colors that vary considerably in brightness (for example, yellow doesn’t display as prominently as red). In R 4.0.0, the palette R4 below will be used, with colors of consistent luminance that are easier to distinguish, especially for viewers with color deficiencies. Additional palettes will make it easy to make base graphics charts that match the color scheme of ggplot2 and other graphics systems.

Read on for more, as well as a link to the rest of the changes.

Comments closed

NFL Kicker Quality

Jacob Long has an outstanding pair of posts on evaluating kickers in the NFL. FIrst up is the analysis itself:

Justin Tucker is so great that, quite frankly, it doesn’t matter which metric you use. PAA, FG% – eFG%, or just plain old FG%, he’s unlike anyone else in the past 10 years. Given the well-documented trend of increasing kicker accuracy in the NFL, I think Tucker has a solid claim on being the greatest kicker of all time.

Even with fewer seasons than many of his competitors, his PAA are double all the others who kicked in the past 10 years. He had a slightly more difficult than average set of attempts but made a higher percentage of his attempts than anyone who has had more than 22 tries. Good luck trying to find any defect in Tucker’s record.

Jacob then covers the method in detail:

Pasteur and Cunningham-Rhoads — I’ll refer to them as PC-R for short — gathered more data than most predecessors, particularly in terms of auxiliary environmental info. They have wind, temperature, and presence/absence of precipitation. They show fairly convincingly that while modeling kick distance is the most important thing, these other factors are important as well. PC-R also find the cardinal direction of every NFL stadium (i.e., does it run north-south, east-west, etc.) and use this information along with wind direction data to assess the presence of cross-winds, which are perhaps the trickiest for kickers to deal with. They can’t know about headwinds/tailwinds because as far as they (and I) can tell, nobody bothers to record which end zone teams defend at the game’s coin toss, so we don’t know without looking at video which direction the kick is going. They ultimately combine the total wind and the cross wind, suggesting they have some meaningful measurement error that makes them not accurately capture all the cross-winds. Using their logistic regressions that factor for these several factors, they calculate an eFG% and use it and its derivatives to rank the kickers.

Those wind factors make certain stadiums like New Era Field (where Buffalo plays) tricky: it’s fun to see two flags right next to each other pointing in opposite directions, or the flags on the field goal posts pointing hard right, then switching to hard left, then switching back to hard right over the course of a field goal try. H/T R-Bloggers

Comments closed

Combining SAS + R for ML

Sophia Rowland has a demo of working with SAS Cloud Analytic Service from R:

The Scripting Wrapper for Analytics Transfer, also known as SWAT, is a package that allows R users to access the power of the SAS Cloud Analytic Service (CAS) from a familiar R interface. The SWAT package is available to SAS Visual Analytics (VA), SAS Visual Statistics (VS), and SAS Visual Data Mining and Machine Learning (VDMML) users. To begin working with SWAT, download and install the package from the SAS Software GitHub page.

Read on for the demo.

Comments closed

Counting Tidyverse Package Arguments

Theo Roe has fun figuring out which tidyverse packages have the greatest number of available arguments in functions:

Before we start anything, I’d like to mention that most of the hard work came from nsaunders and his great blog post Idle thoughts lead to R internals: how to count function arguments.

Let’s get started.

The aim of this blog is to capture the number of arguments present in each function with packages of the tidyverse

Click through to see the code, as well as some methods of visualizing the results (methods which you can use in other situations).

Comments closed