John Cook has an interesting solution:
This post will look at the problem of updating an average grade as a very simple special case of Bayesian statistics and of Kalman filtering.
Suppose you’re keeping up with your average grade in a class, and you know your average after n tests, all weighted equally.
Click through for the walkthrough. This is similar to something I tried to puzzle out but ultimately admitted defeat: is there a way to calculate updates to the median without needing to know the entire set? In practical terms, this would be something like, how many pieces of information do I need to guarantee that I can maintain a median over time?
The best I could come up with was built along the premise of the likelihood of new data points being less than the median versus those greater than the median, where each pair of greater-lesser cancel each other out. If you have roughly equal numbers of new data points to each side, your “elements of the median” array can be pretty small. But the problem is, for any sufficiently small k, where k represents the number of elements you keep in memory, it is possible for a localized collection of (without loss of generality) lower-than-median data points to come in and completely wash out your memory. For example, if you kept 3 points and memory and you have four values below the median, you no longer know what the median is.
Trying to solve this without knowing the shape of the distribution or make any sequencing assumptions is something that I failed to do.
Leave a Comment