I'm reading about Bayesian data analysis by Gelman et al. and I'm having big trouble interpreting th

rigliztetbf 2022-06-26 Answered
I'm reading about Bayesian data analysis by Gelman et al. and I'm having big trouble interpreting the following part in the book (note, the rat tumor rate θ in the following text has: θ B e t a ( α , β )
Choosing a standard parameterization and setting up a ‘noninformative’ hyperprior dis- tribution.
Because we have no immediately available information about the distribution of tumor rates in populations of rats, we seek a relatively diffuse hyperprior distribution for ( α , β ). Before assigning a hyperprior distribution, we reparameterize in terms of logit ( α α + β ) = log ( α β ) and log ( α + β ), which are the logit of the mean and the logarithm of the ‘sample size’ in the beta population distribution for θ . It would seem reasonable to assign independent hyperprior distributions to the prior mean and ‘sample size,’ and we use the logistic and logarithmic transformations to put each on ( , ) scale. Unfortunately, a uniform prior density on these newly transformed parameters yields an improper posterior density, with an infinite integral in the limit ( α + β ) , and so this particular prior density cannot be used here.
In a problem such as this with a reasonably large amount of data, it is possible to set up a ‘noninformative’ hyperprior density that is dominated by the likelihood and yields a proper posterior distribution. One reasonable choice of diffuse hyperprior density is uniform on ( α α + β , ( α + β ) 1 / 2 ), which when multiplied by the appropriate Jacobian yields the following densities on the original scale,
p ( α , β ) ( α + β ) 5 / 2 ,
and on the natural transformed scale:
p ( log ( α β ) , log ( α + β ) ) α β ( α + β ) 5 / 2 .
My problem is especially the bolded parts in the text.
Question (1): What does the author explicitly mean by: "is uniform on ( α α + β , ( α + β ) 1 / 2 )
Question (2): What is the appropriate Jacobian?
Question (3): How does the author arrive into the original and transformed scale priors?
To me the book hides many details under the hood and makes understanding difficult for a beginner on the subject due to seemingly ambiguous text.
P.S. if you need more information, or me to clarify my questions please let me know.
You can still ask an expert for help

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

Solve your problem for the price of one coffee

  • Available 24/7
  • Math expert for every subject
  • Pay only if we can solve it
Ask Question

Answers (1)

Raven Higgins
Answered 2022-06-27 Author has 17 answers
I figured the solution out myself so I'm gonna share it here if anyone is going to bump into the same part in Gelman's book (pages 110-111).
The author simply means by this that
p ( α α + β , ( α + β ) 1 / 2 ) = constant 1.
Answer (2):
When the author talks about "appropriate Jacobian" he's talking about the determinant of the Jacobian matrix in the change of variables formula for density functions:
p ( ϕ ) = p ( θ ) det ( d θ d ϕ )
Answer (3):
The author simply applies the change of variables formula two times. We know that
p ( γ , δ ) = p ( γ ( α , β ) , δ ( α , β ) ) = p ( α α + β , ( α + β ) 1 / 2 ) = constant 1.
If we denote θ = ( γ , δ ) and ϕ = ( α , β )), then:
det ( d θ d ϕ ) = | d γ d α d γ d β d δ d α d δ d β | = | β ( α + β ) 2 α ( α + β ) 2 1 2 ( α + β ) 3 / 2 1 2 ( α + β ) 3 / 2 | = 1 2 ( α + β ) 5 / 2 .
From change of variables formula we get:
p ( α , β ) = p ( α α + β , ( α + β ) 1 / 2 ) = constant  1 ( 1 2 ( α + β ) 5 / 2 ) ( α + β ) 5 / 2 ,
and there it is, i.e. the prior in original scale.
For the alternative scale, by using change of variables in exactly the same manner:
p ( α , β ) = p ( log ( α β ) , log ( α + β ) ) det ( d θ d ϕ ) ,
where this time γ ( α , β ) = log ( α β ) and δ ( α , β ) = log ( α + β ) For the Jacobian determinant we get:
det ( d θ d ϕ ) = | d γ d α d γ d β d δ d α d δ d β | = | 1 / α 1 / β ( α + β ) 1 ( α + β ) 1 | = 1 α β ,
so we get:
p ( α , β ) ( α + β ) 5 / 2 = p ( log ( α β ) , log ( α + β ) ) 1 α β ,
or
p ( log ( α β ) , log ( α + β ) ) α β ( α + β ) 5 / 2 .
Did you like this example?
Subscribe for all access

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

You might be interested in

asked 2021-02-23
Interpreting z-scores: Complete the following statements using your knowledge about z-scores.
a. If the data is weight, the z-score for someone who is overweight would be
-positive
-negative
-zero
b. If the data is IQ test scores, an individual with a negative z-score would have a
-high IQ
-low IQ
-average IQ
c. If the data is time spent watching TV, an individual with a z-score of zero would
-watch very little TV
-watch a lot of TV
-watch the average amount of TV
d. If the data is annual salary in the U.S and the population is all legally employed people in the U.S., the z-scores of people who make minimum wage would be
-positive
-negative
-zero
asked 2022-06-16
Interpreting Entropy
All you data scientists will probably know the entropy equation:
H ( p ) = i = 1 n p i log 2 p i
And, using this, I was messing around with some compression, and calculated the entropy for a set of probablilties { 0.3 , 0.2 , 0.2 , 0.1 }, which came out as about 2.246.
This doesn't make sense to me, because if Entropy 1/Compression, then I've done the impossible by compressing data with these proportions.
I find myself confused as to how to interpret this value any other way. Is it bits per arbitrary unit? Am I simply wrong?
asked 2022-07-06
What does x% colder mean?
I see people using percentage increases to talk about temperature; for example
"Two weather predictions were presented with somewhat conflicting data, with the Energy Information Administration (EIA) predicting the 2007-08 winter to be 4 percent colder than 2006-07, but still 2 percent warmer than the 30-year average."
Is there any meaningful way to interpret this? I don't know what it means for one year to be 4 percent colder than another.
asked 2021-10-20
Simplify the following expression:
(2-3y)+(6+8y)
asked 2022-05-27
Descriptive statistical analysis
I have some data obtained from questionnaire. The question asks to perform a descriptive statistical analysis for the above data and hence interpret your results. What does it mean? Is it mean I have to find the mean median mode standard deviation variance?
asked 2022-04-10
Interpreting Normal Quantite Plots. In Exercises 5–8, examine the normal quantite plot and determine whether the sample data appear to be from a population with a normal distribution.
Dunkin’ Donuts Service Times The normal quantile plot represents service times during the dinner hours at Dunkin’ Donuts (from Data Set 25 “Fast Food” in Appendix B).
asked 2022-05-26
Meaningful statistic measure of data pairs
I have a dilemma.
I have pairwise data, (a,b), that represents some form of speed, whether it's miles/hour or megabits/second. Let's say that we have the following set of data from measuring the "speed" of the same "car" on two different "courses" A and B under different conditions (say weather), ignoring the unit for now, and with apologies for my notations.
{(a,b)} = {(1, 4), (1, 2), (1, 1), (2, 1), (4, 1)}
There are a few ways to interpret the data:
1) If one believes that the "car" runs faster on "course" A and calculates the average of speed-up (a/b):
average speedup = (1/4 + 1/2 + 1/1 + 2/1 + 4/1)/5 = 1.55
Voila, "course" A makes the "car" go 55% faster than "course" B!
2) If one believes that the "car" runs faster on "course" B and calculates the average of speed-up (b/a):
average speedup = (4/1 + 2/1 + 1/1 + 1/2 + 1/4)/5 = 1.55
Voila, "course" B makes the "car" go 55% faster than "course" A!
3) If one believes say CDF is the right way to compare the data set, then it will be something like:
A: 1, 1, 1, 2, 4
B: 1, 1, 1, 2, 4
Voila, "courses" A and B are more or less identical.
Obviously, all three interpretations mask out important details, but at the same time, I am not able to come up with a good statistical measure to describe this set of data. This is clearly just a toy example, but I am facing similar dilemmas interpreting real data.
Thanks!

New questions