Estimating Probabilities from Unevenly Collected Data

Nina Zumel answers an important question:

In this article, we look at the problem of estimating and comparing probabilities about a population of subjects from unevenly collected observations. Some examples might include:

The perceived quality of a movie (how often is a movie positively reviewed) when some movies have far more reviews than others.

The effectiveness of various ad campaigns, when some compaigns have had more exposure than others.

The efficacy of a certain medical procedure by hospital, when some hospitals have had more cases than others.

For our specific task, we’ll try to estimate the “innate” batting ability (the probability of making a hit when at bat) of major league baseball players in 2023. For the sake of this article, we will take this single season of data as everything that we know about these players and their batting statistics.

It’s an interesting problem because she’s looking at 2023 data as an estimation of the player’s entire career, with the goal of estimating how a player will perform overall given a fairly reasonably sized sample of information collected from one relatively short period of that player’s career. H/T John Mount.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31