Is the posterior always a compromise between the prior and the data? Suppose that we are interested in learning the proportion of the population theta with a particular property (for instance, the fraction of the population who are male).
NormmodulxEE
Answered question
2022-11-22
Is the posterior always a compromise between the prior and the data? Suppose that we are interested in learning the proportion of the population with a particular property (for instance, the fraction of the population who are male). Suppose that we randomly sample n members of this population (with replacement, to make things easier) and observe that y of them have the property (so the fraction of the sample with the property is y/n). We start with a continuous prior p() with full support [0,1] and update this using Bayes rule. Question: does the expected value of the posterior always lie between the prior expectation and the sample fraction y/n?
Answer & Explanation
Lucas Contreras
Beginner2022-11-23Added 11 answers
Step 1 Nice question! Sadly it is not true. I'll talk in terms of using coin flips to determine the bias of a weighted coin, where is the probability that the coin flips heads. Here's the idea: consider a prior in which we assign very high probability to values of very close to either 0 or 1 and very low probability otherwise, such that the prior expectation is . Now suppose we see, say, heads in the sample. This is very unlikely if is close to 0 and much more likely if it's close to 1, so the posterior distribution is concentrated on values of close to 1 and in particular the posterior mean is close to 1, which makes it potentially larger than either the prior expectation or the sample fraction . You can see hints of this already in your beta distribution calculation: the inequalities you want assume that and are positive, and are false for, say, . Of course the beta integral does not converge in this case but this gives us an idea of what to look for. Formally, take H (for "height") to be a large positive constant and w (for "width") and to be small positive constants, and consider the "triangular" prior with probability density function
We have so this is a pdf as long as . It's symmetric about , so the prior expectation is . Step 2 Now suppose we flip 3 coins and 2 of them are heads. Then the posterior density is the normalization of , and so the posterior expectation is . This is a slightly tedious but doable calculation which I will punt to WolframAlpha; we get
so the posterior expectation is their quotient. This is a bit annoying to write out in full so let's just talk about how it behaves asymptotically. If w is small then the polynomials in w above are dominated by their terms with smallest exponent, namely , which in both cases comes from the portion of the integral corresponding to and hence ; importantly this portion of the integral approximately does not depend on the exponent n of . We have which gives, for both w and small,
so we see that by taking w to be small but to be much smaller we can arrange for the posterior expectation to be arbitrarily close to 1, and in particular not in the interval , as expected. To be concrete we can take, say, . If you're looking for a conceptual upshot, one conceptual upshot here is that when the prior is very "lopsided" like this, the prior mean is not a good summary of it, and when the prior is concentrated on two very different hypotheses, apparently small amounts of evidence can tilt the balance between them dramatically.