Imputation is a complex process that requires a good knowledge of your data. For example, it is crucial to know whether the missing is at random or not before you impute the data. I have read a nice tutorial which visualize the missing data and help to understand the type of missing, and another post showing how to impute the data with
In this short post, I will focus on management of the missing data using the
tidyversepackage. Specifically, I will show how to manage missings in the long data format (i.e., more than one observation for id).
Anisa shows a few different techniques, depending upon what you need to do with the data. I’d caution about using mean in the second example and instead typically prefer median, as replacing missing values with the median won’t alter the distribution in the way that it can with mean.