Working With Missing Values In R

Kevin Feasel



Anisa Dhana has a few examples of ways we can work with data containing missing values in R:

Imputation is a complex process that requires a good knowledge of your data. For example, it is crucial to know whether the missing is at random or not before you impute the data. I have read a nice tutorial which visualize the missing data and help to understand the type of missing, and another post showing how to impute the data with MICE package.

In this short post, I will focus on management of the missing data using the tidyverse package. Specifically, I will show how to manage missings in the long data format (i.e., more than one observation for id).

Anisa shows a few different techniques, depending upon what you need to do with the data.  I’d caution about using mean in the second example and instead typically prefer median, as replacing missing values with the median won’t alter the distribution in the way that it can with mean.

Related Posts

Biases in Tree-Based Models

Nina Zumel looks at tree-based ensembling models like random forest and gradient boost and shows that they can be biased: In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data […]

Read More

R 3.6.1 Available

David Smith notes a new version of R is available: On July 5, the R Core Group released the source code for the latest update to R, R 3.6.1, and binaries are now available to download for Windows, Linux and Mac from your local CRAN mirror. R 3.6.1 is a minor update to R that fixes a few bugs. […]

Read More


December 2018
« Nov Jan »