"I am studying a Tutorial on Maximum Likelihood Estimation in Linear Regression and I have a question. When we have more than one regressor (a.k.a. multiple linear regression1), the model comes in its matrix form y=Xbeta+in, (1)where y is the response vector, X is the design matrix with each its row specifying under what design or conditions the corresponding response is observed (hence the name), beta is the vector of regression coefficients, and ϵ is the residual vector distributing as a zero-mean multivariable Gaussian with a diagonal covariance matrix

Pranav Ward

Pranav Ward

Answered question

2022-09-13

I am studying a Tutorial on Maximum Likelihood Estimation in Linear Regression and I have a question.
When we have more than one regressor (a.k.a. multiple linear regression1), the model comes in its matrix form y = X β + ϵ, (1)where y is the response vector, X is the design matrix with each its row specifying under what design or conditions the corresponding response is observed (hence the name), β is the vector of regression coefficients, and ϵ is the residual vector distributing as a zero-mean multivariable Gaussian with a diagonal covariance matrix N ( 0 , σ 2 I N ), where I N is the N × N identity matrix. Therefore y N ( X β , σ 2 I N ), (2)meaning that linear combination X β explains (or predicts) response y with uncertainty characterized by a variance of σ 2 .
Assume y, β, and ϵ R n Under the model assumptions, we aim to estimate the unknown parameters ( β and σ 2 ) from the data available (X and y).
Maximum likelihood (ML) estimation is the most common estimator. We maximize the log-likelihood w.r.t. β and σ 2 L ( β , σ 2 | y , X ) = N 2 log 2 π N 2 l o g σ 2 1 2 σ 2 ( y X β ) T ( y X β )
I am trying to understand that how the log-likelihood, L ( β , σ 2 | y , X ), is formed. Normally, I saw these problems when we have x i as vector of size d(d is number of parameter for each data). specifically, when xi is a vector, I wrote is as
ln i = 1 N 1 ( 2 π ) d σ 2 exp ( 1 2 σ 2 ( x i μ ) T ( x i μ ) ) = i ln 1 ( 2 π ) d σ 2 exp ( 1 2 σ 2 ( x i μ ) T ( x i μ ) ) . But in the case that is shown in this tutorial, there is no index I to apply summation.

Answer & Explanation

Nelson Santana

Nelson Santana

Beginner2022-09-14Added 13 answers

I think it's relatively easy to get mixed up here due to notation. In the case you present from the textbook, they're considering a product of one-dimensional gaussians which are independent from each other, and then writing the form as a multi-dimensional gaussian (since then the covariance matrix of this multidimensional gaussian is exactly σ 2 I). E.g. note that each sample has the form y i ( X β ) i N ( 0 , σ 2 ). Taking the product of these distributions yields the multi-dimensional gaussian above.
In your exposition, on the other hand, you're writing a multi-dimensional gaussian which is i.i.d.; this is different than what the textbook is referring to, since the mean should change between distributions (e.g. the samples are independent, but not identically drawn, since we're observing different data points with some additional noise, ε N ( 0 , σ 2 )).

Do you have a similar question?

Recalculate according to your conditions!

New Questions in Research Methodology

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?