Is the posterior always a compromise between the prior and the data?Suppose that we are interested in learning the proportion of the population θ with a particular property (for instance, the fraction of the population who are male). Suppose that we randomly sample n members of this population (with replacement, to make things easier) and observe that y of them have the property (so the fraction of the sample with the property is y/n). We start with a continuous prior p( θ) with full support [0,1] and update this using Bayes rule.Question: does the expected value of the posterior always lie between the prior expectation and the sample fraction y/n?

Question

Is the posterior always a compromise between the prior and the data?Suppose that we are interested in learning the proportion of the population   θ with a particular property (for instance, the fraction of the population who are male). Suppose that we randomly sample n members of this population (with replacement, to make things easier) and observe that y of them have the property (so the fraction of the sample with the property is y/n). We start with a continuous prior p(  θ) with full support [0,1] and update this using Bayes rule.Question: does the expected value of the posterior always lie between the prior expectation and the sample fraction y/n?

Lucas Contreras · Accepted Answer

Step 1Nice question! Sadly it is not true. I&#039;ll talk in terms of using coin flips to determine the bias   θ of a weighted coin, where   θ is the probability that the coin flips heads. Here&#039;s the idea: consider a prior in which we assign very high probability to values of   θ very close to either 0 or 1 and very low probability otherwise, such that the prior expectation is       1    2  . Now suppose we see, say,       2    3   heads in the sample. This is very unlikely if   θ is close to 0 and much more likely if it&#039;s close to 1, so the posterior distribution is concentrated on values of   θ close to 1 and in particular the posterior mean is close to 1, which makes it potentially larger than either the prior expectation       1    2   or the sample fraction       2    3  .You can see hints of this already in your beta distribution calculation: the inequalities you want assume that   α and   β are positive, and are false for, say,   α  =  0  ,  β  =  −      1    2  . Of course the beta integral does not converge in this case but this gives us an idea of what to look for.Formally, take H (for &quot;height&quot;) to be a large positive constant and w (for &quot;width&quot;) and   ϵ to be small positive constants, and consider the &quot;triangular&quot; prior with probability density function  p  (  θ  )  =      {                            H                      (            1            −                          θ              w                        )                    +          ϵ                           if           0          ≤          θ          ≤          w                                      ϵ                           if           w          ≤          θ          ≤          1          −          w                                      H                      (            1            −                                          1                −                θ                            w                        )                    +          ϵ                           if           1          −          w          ≤          θ          ≤          1.                          .We have       ∫    0    1    p  (  θ  )    d  θ  =  H  w  +  ϵ so this is a pdf as long as   H  w  +  ϵ  =  1. It&#039;s symmetric about       1    2  , so the prior expectation       E    (  θ  ) is       1    2  .Step 2Now suppose we flip 3 coins and 2 of them are heads. Then the posterior density is the normalization of       θ    2    (  1  −  θ  )  p  (  θ  ), and so the posterior expectation is                     E            (              θ        3            (      1      −      θ      )      )                      E            (              θ        2            (      1      −      θ      )      )      . This is a slightly tedious but doable calculation which I will punt to WolframAlpha; we get      E    (      θ    2    (  1  −  θ  )  )  =      (    −                  w        3            12        +                  w        2            6        )    H  +      ϵ    12        E    (      θ    3    (  1  −  θ  )  )  =      (    −                  w        5            15        +                  w        4            5        −                  w        3            4        +                  w        2            6        )    H  +      ϵ    20  so the posterior expectation is their quotient. This is a bit annoying to write out in full so let&#039;s just talk about how it behaves asymptotically. If w is small then the polynomials in w above are dominated by their terms with smallest exponent, namely             w      2        6  , which in both cases comes from the portion of the integral corresponding to   θ  ≈  1 and hence       θ    n    (  1  −  θ  )  ≈  1  −  θ; importantly this portion of the integral approximately does not depend on the exponent n of   θ. We have   H  =            1      −      ϵ        w   which gives, for both w and   ϵ small,      E    (      θ    2    (  1  −  θ  )  )  =      w    6    +      ϵ    12    +  O  (      w    2    +  ϵ  w  )      E    (      θ    3    (  1  −  θ  )  )  =      w    6    +      ϵ    20    +  O  (      w    2    +  ϵ  w  )so we see that by taking w to be small but   ϵ to be much smaller we can arrange for the posterior expectation to be arbitrarily close to 1, and in particular not in the interval       [          1      2        ,          2      3        ]  , as expected. To be concrete we can take, say,   w  =  0.01  ,  ϵ  =  0.0001.If you&#039;re looking for a conceptual upshot, one conceptual upshot here is that when the prior is very &quot;lopsided&quot; like this, the prior mean is not a good summary of it, and when the prior is concentrated on two very different hypotheses, apparently small amounts of evidence can tilt the balance between them dramatically.

Is the posterior always a compromise between the prior and the data? Suppose that we are interested in learning the proportion of the population theta with a particular property (for instance, the fraction of the population who are male).

Answered question

Answer & Explanation

New Questions in College Statistics