Peter Ellis finds interesting results with sampling in R:
A week ago I was surprised to read on Thomas Lumley’s Biased and Inefficient blog that when using R’s
sample()
function without replacement and with unequal probabilities of individual units being sampled:“What R currently has is sequential sampling: if you give it a set of priorities w it will sample an element with probability proportional to w from the population, remove it from the population, then sample with probability proportional to w from the remaining elements, and so on. This is useful, but a lot of people don’t realise that the probability of element i being sampled is not proportional to w_i”
Read on for a demonstration. H/T R-Bloggers.