Improve Your Algebra Skills with Practice Problems

Recent questions in Pre-Algebra
Pre-AlgebraAnswered question
Gabriella Sellers Gabriella Sellers 2022-06-13

Log predictive density asmptotically in predictive information criteria for Bayesian models
I am reading this paper, Andrew Gelman's Understanding predictive information criteria for Bayesian models, and I will give a screenshot as below:
Under standard conditions, the posterior distribution, p ( θ y ), approaches a normal distribution in the limit of increasing sample size (see, e.g., DeGroot, 1970). In this asymptotic limit, the posterior is dominated by the likelihood-the prior contributes only one factor, while the likelihood contributes n factors, one for each data point-and so the likelihood function also approaches the same normal distribution.
As sample size n , we can label the limiting posterior distribution as θ | y N ( θ 0 , V 0 / n ). In this limit the log predictive density is
l o g p ( y θ ) = c ( y ) 1 2 ( k l o g ( 2 π ) + l o g | V o / n l + ( θ θ 0 ) T ( V o / n ) 1 ( θ θ 0 ) )
where c(y) is a constant that only depends on the data y and the model class but not on the parameters θ.
The limiting multivariate normal distribution for 0 induces a posterior distribution for the log predictive density that ends up being a constant (equal to  c ( y ) 1 2 ( k l o g ( 2 π ) + l o g | V o / n | ) ) minus 1 2 times a χ k 2 random variable, where k is the dimension of θ, that is, the number of parameters in the model. The maximum of this distribution of the log predictive density is attained when equals the maximum likelihood estimate (of course), and its posterior mean is at a vaue lower.
For actual posterior distributions, this asymptotic result is only an approximation, but it will be useful as a benchmark for interpreting the log predictive density as a measure of fit.
With singular models (e.g. mixture models and overparameterized complex models more gener- ally) a set of different parameters can map to a single data model, the Fisher information matrix i not positive definite, plug-in estimates are not representative of the posterior, and the distribution of the deviance does not converge to a χ 2 distribution. The asymptotic behavior of such models can be analyzed using singular learning theory (Watanabe, 2009, 2010).
Sorry for the long paragraph. The things that confuse me are:
1. Why here seems like we know the posterior distribution f ( θ | y ) first, then we use it to find the log p ( y | θ )? Shouldn't we get the model, log p ( y | θ ) first?
2. What does the green line "its posterior mean is at a value k 2 lower" mean? My understanding is since there is a term 1 2 χ k 2 in the expression and the expectation of χ k 2 is k, which lead to a k 2 lower. But k 2 lower than what?
3. How does the log p ( y | θ ) interpreting the measure of fit? I can see that there is a mean square error(MSE) term in this expression but it is an MSE of the parameter θ, not the data y.
Thanks for any help!

Pre-AlgebraAnswered question
Dale Tate Dale Tate 2022-06-13

How do you solve 5 ( 5 x - 2 ) < 15 ?

Pre-AlgebraAnswered question
dourtuntellorvl dourtuntellorvl 2022-06-13

How do you solve 36 > - 1 2 y ?

Pre-AlgebraAnswered question
taghdh9 taghdh9 2022-06-13

How do you solve 27 > x + 18 ?

Pre-AlgebraAnswered question
Leah Pope Leah Pope 2022-06-13

How do you solve and graph x - 8 > 4 ?

Getting your pre Algebra solved becomes much easier when you have all the answers to your questions by taking a closer look at the various examples dealing with Pre-Algebra subjects. We have intentionally collected the list of pre-algebra equations and various solving equations with decimals problems to help you see various examples that explain how certain solutions are found. Still, if something sounds unclear or you are concerned about some solution that has been provided, approach pre-algebra with reverse engineering approach (going backwards).