Press "Enter" to skip to content

Category: R

Understanding the Bayesian Nature of Kalman Filters

Holger von Jouanne-Diedrich gives us an interesting interpretation of Kalman filters:

The Kalman filter is a very powerful algorithm to optimally include uncertain information from a dynamically changing system to come up with the best educated guess about the current state of the system. Applications include (car) navigation and stock forecasting. If you want to understand how a Kalman filter works and build a toy example in R, read on!

The following post is based on the post “Das Kalman-Filter einfach erklärt” which is written in German and uses Matlab code (so basically two languages nobody is interested in any more 😉 ). This post is itself based on an online course “Artificial Intelligence for Robotics” by my colleague Professor Sebastian Thrun of Standford University.

In fairness, I regret only one thing about learning German: that I’ve forgotten so much over the years.

Comments closed

R and the TIOBE Index

Alex Woodie notices a change in fortunes for R:

Don’t look now, but R, which some had written off as a language in terminal decline in lieu of Python’s immense and growing popularity, appears to be staging a furious comeback the likes of which IT has rarely seen.

According to the TIOBE Index, which tracks the popularity of programming languages (as expressed in Web searches), R has risen an unprecedented 12 spots, up from number 20 in the summer of 2019 to number 8 on its list today.

I’m happy to see this, as frankly, I think R’s a better language for statistical analysis and data visualization than Python and it’s not close. That’s the advantage of being a DSL: you get to focus on doing one or two things really well, and for R that’s statistical analysis and data visualization.

Comments closed

Web Page Scraping with R and ML Services

Dave Mason shows how you can scrape webpages with R and pull the resulting data into SQL Server using Machine Learning Services:

For this post, it might make more sense to skip ahead to the end result, and then work our way backwards. Here is a web page with some data: Boston Celtics 2016-2017. It shows two HTML tables (grids) of data for the Boston Celtics, a professional basketball team. The first grid lists the roster of players. We will scrape the web page, and write the data from the “roster” grid to a SQL Server table.

Read on for a demonstration of the process.

Comments closed

The Basics of Autoregressive Models

Holger von Jouanne-Diedrich explains some of the principels of autoregressive models through a demonstration:

Well, this seems to be good news for the sales team: rising sales! Yet, how does this model arrive at those numbers? To understand what is going on we will now rebuild the model. Basically, everything is in the name already: auto-regressive, i.e. a (linear) regression on (a delayed copy of) itself (auto from Ancient Greek self)!

So, what we are going to do is create a delayed copy of the time series and run a linear regression on it. We will use the lm() function from base R for that (see also Learning Data Science: Modelling Basics).

Read on for some additional understanding.

Comments closed

Using Specific R Package Versions in Docker Images

Roman Lustrik shares how to fix package versions in Docker images:

Using package in R is easy. You install from CRAN using install.packages("packagename"), it resolves dependencies and you’re good to go. What R natively doesn’t handle so well is installing a particular package version without jumping through hoops. Technically you need the source file of the package version you want to install AND all source files of the dependencies (in the correct version, of course). This has been made almost seamless with packages packrat and recently, renv.

This comes handy when you are constructing a Docker file to run in production. Usually you want to run this defensively and do not want things to change from one image build to another. To get there, you can save all your package names and version into a file (renv.lock) and use that to reconstruct the now defined package structure with predictable versions (see renv vignette here).

This is quite useful as R package developers tend not to covet backwards compatibility, and one of the key benefits of containers is to have the option to keep the same code base and configuration in all environments.

Comments closed

Using INLA for Spatial Regression in R

Lionel Hertzog continues a series on spatial regression:

INLA is a package that allows to fit a broad range of model, it uses Laplace approximation to fit Bayesian models much, much faster than algorithms such as MCMC. INLA allows for fitting geostatistical models via stochastic partial differential equation (SPDE), a good place for more background informations on this are these two gitbooks: spde-gitbook and inla-gitbook.

This is not the gentlest introduction, so if you’re new to the concept go back and read part 1.

Comments closed

Data Visualization in R

Dan Fitton provides an introductory overview to several visualization tools in R:

The other way to communicate data with R is to produce an interactive dashboard or web application within R using Shiny. Whereas Markdown reports are most useful for explanatory analysis; Shiny, in my opinion, is useful for exploratory data analysis. This is when you want to display information for investigative purposes, allowing the user to gain greater familiarity by having the ability to interact with data, filter it, and dig deeper into the underlying details.

Shiny is incredibly flexible, providing the user the capability of turning their R code and objects, including tables, plots, and analysis, into a comprehensive and interactive web page or app, without requiring a fully-fledged web development skillset. Although there is a steep learning curve, the freedom and precision Shiny brings means that for the most part you are limited only by your skillset rather than the tool itself.

I’ve seen some really useful Shiny dashboards. Dan is right that there can be a lot of work put into getting them right, but if you do, the results can be outstanding.

Comments closed

Text Customization with ggtext

Abdul Majed Raja shows an example of using the ggtext library:

ggplot2 is go-to R package for anyone who wants to make beautiful static visualizations in R. But most ggplot2 gplots look almost the same and little many data analysts or data scientists care about customizing it, primarily because it’s not very intuitive to do so. That’s where ggplot2 extensions come in very handy. ggtext is an R package (by Claus O. Wilke) that helps in customizing the text present in ggplot2 plots. It could be the text outside the plot canvas or the text (annotation) within the plot canvas.

Click through for the code sample and video. H/T R-Bloggers.

Comments closed