R – Page 66 – Curated SQL

Web-Optimized ggplot2 Themes

Published 2020-08-13 by Kevin Feasel

Petr Baranovskiy shares a few new themes:

This will be a very short post compared to the detailed stuff I usually write. Just what it says on the tin – I made some tweaks to my three favorite {ggplot2} themes – theme_bw(), theme_classic(), and theme_void() – to make the graphics more readable and generally look better when posted online, particularly in blog posts. Please feel free to borrow and use.
Also, I will be frequently using these themes in subsequent posts, and I’d like to be able to point readers here with a hyperlink instead of repeatedly posting the whole theme_web_…() code every time I am writing a post.

Click through for the definition of each theme. H/T R-Bloggers

Comments closed

EXTPTR_PTR Error with Rcpp

Published 2020-08-10 by Kevin Feasel

Rick Pack walks us through an error in R:

I experienced a need to update Rcpp when I attempted to install the readxlsb R package, which promised to enable me to read .xlsb files in R.
What happened next has been forgotten: Did the attempted update of Rcpp appear to succeed or fail? I did record that my attempted installation of readxlsb still failed and I now experienced an unfamiliar error when I opened and closed R Studio:
“The procedure entry point EXTPTR_PTR could not be located in the dynamic link library”

Read on to see how Rick solved this problem.

Comments closed

Credential and Secrets Management in R

Published 2020-08-03 by Kevin Feasel

Bernardo Lares walks us through some good practices around managing credentials and secrets in R:

I have several functions that live in my public lares library that use get_creds() to fetch my secrets. Some of them are used as credentials to query databases, send emails with API services such as Mailgun, ping notifications using Slack‘s webhook, interacting with Google Sheets programatically, fetching Facebook and Twitter’s API stuff, Typeform, Github, Hubspot… I even have a portfolio performance report for my personal investments. If you check the code underneath, you won’t find credentials written anywhere but the code will actually work (for me and for anyone that uses the library). So, how can we accomplish this?

Read on to learn how.

Comments closed

Choroplethr 3.6.4 on CRAN

Published 2020-08-03 by Kevin Feasel

Ari Lamstein announces that Choroplethr version 3.6.4 is now on CRAN:

Choroplethr v3.6.4 is now on CRAN. This is the first update to the package in two years, and was necessary because of a recent change to the tigris package, which choroplethr uses to make Census Tract maps. I also took this opportunity to add new example demographic data for Census Tracts.

Read on for a listing of the updates, examples, and a request from Ari to help keep the project up to date by finding a suitable sponsor. H/T R-Bloggers

Comments closed

Optimizing a Poisson Survival Model

Published 2020-07-24 by Kevin Feasel

Joshua Entrop shows off optimx() in R to perform a survival analysis:

In this blog post, we will fit a Poisson regression model by maximising its likelihood function using optimx() in R. As an example we will use the lung cancer data set included in the {survival} package. The data set includes information on 228 lung cancer patients from the North Central Cancer Treatment Group (NCCTG). Specifically, we will estimate the survival of lung cancer patients by sex and age using a simple Poisson regression model. You can download the code that I will use throughout post here.

Read the whole thing. H/T R-bloggers

Comments closed

The Basics of Randomized Response

Published 2020-07-22 by Kevin Feasel

Holger von Jouanne-Diedrich explains how randomized response can protect any single person’s opinion from a pollster while providing insight on the whole population:

So, is there a method to find the respective proportion of people without putting them on the spot? Actually, there is! If you want to learn about randomized response (and how to create flowcharts in R along the way) read on!
The question is how can you get a truthful result overall without being able to attribute a certain answer to any single individual. As it turns out, there is a very elegant and ingenious method, called randomized response. The big idea is to, as the name suggests, add noise to every answer without compromising the overall proportion too much, i.e. add noise to every answer so that it cancels out overall!

Click through for the process. It’s definitely a clever idea.

Comments closed

Sparklyr 1.3 Released

Published 2020-07-17 by Kevin Feasel

Yitao Li announces sparklyr 1.3:

sparklyr 1.3 is now available on CRAN, with the following major new features:
– Higher-order Functions to easily manipulate arrays and structs
– Support for Apache Avro, a row-oriented data serialization framework
– Custom Serialization using R functions to read and write any data format
– Other Improvements such as compatibility with EMR 6.0 & Spark 3.0, and initial support for Flint time series library

Between this and the work from the Spark side, we are seeing some nice quality of life improvements for Spark and R.

Comments closed

More Scraping Web Pages

Published 2020-07-15 by Kevin Feasel

Dave Mason continues scraping web pages for fun and profit:

In the last post, we looked at a way to scrape HTML table data from web pages, and save the data to a table in SQL Server. One of the drawbacks is the need to know the schema of the data that gets scraped–you need a SQL Server table to store the data, after all. Another shortcoming is if there are multiple HTML tables, you need to identify which one(s) you want to save.
For this post, we’ll revisit web scraping with Machine Learning Services and R. This time, we’ll take a schema-less approach that returns JSON data. As before, this web page will be scraped: Boston Celtics 2016-2017. It shows two HTML tables (grids) of data for the Boston Celtics, a professional basketball team. The first grid lists the roster of players, the second is a listing of games played during the regular season.

Click through to see how Dave manages this feat.

Comments closed

Understanding the Bayesian Nature of Kalman Filters

Published 2020-07-14 by Kevin Feasel

Holger von Jouanne-Diedrich gives us an interesting interpretation of Kalman filters:

The Kalman filter is a very powerful algorithm to optimally include uncertain information from a dynamically changing system to come up with the best educated guess about the current state of the system. Applications include (car) navigation and stock forecasting. If you want to understand how a Kalman filter works and build a toy example in R, read on!
The following post is based on the post “Das Kalman-Filter einfach erklärt” which is written in German and uses Matlab code (so basically two languages nobody is interested in any more 😉 ). This post is itself based on an online course “Artificial Intelligence for Robotics” by my colleague Professor Sebastian Thrun of Standford University.

In fairness, I regret only one thing about learning German: that I’ve forgotten so much over the years.

Comments closed

R and the TIOBE Index

Published 2020-07-13 by Kevin Feasel

Alex Woodie notices a change in fortunes for R:

Don’t look now, but R, which some had written off as a language in terminal decline in lieu of Python’s immense and growing popularity, appears to be staging a furious comeback the likes of which IT has rarely seen.
According to the TIOBE Index, which tracks the popularity of programming languages (as expressed in Web searches), R has risen an unprecedented 12 spots, up from number 20 in the summer of 2019 to number 8 on its list today.

I’m happy to see this, as frankly, I think R’s a better language for statistical analysis and data visualization than Python and it’s not close. That’s the advantage of being a DSL: you get to focus on doing one or two things really well, and for R that’s statistical analysis and data visualization.

Comments closed

Category: R