# "I am studying a Tutorial on Maximum Likelihood Estimation in Linear Regression and I have a question. When we have more than one regressor (a.k.a. multiple linear regression1), the model comes in its matrix form y=Xbeta+in, (1)where y is the response vector, X is the design matrix with each its row specifying under what design or conditions the corresponding response is observed (hence the name), beta is the vector of regression coefficients, and ϵ is the residual vector distributing as a zero-mean multivariable Gaussian with a diagonal covariance matrix

I am studying a Tutorial on Maximum Likelihood Estimation in Linear Regression and I have a question.
When we have more than one regressor (a.k.a. multiple linear regression1), the model comes in its matrix form $y=X\beta +ϵ$, (1)where y is the response vector, X is the design matrix with each its row specifying under what design or conditions the corresponding response is observed (hence the name), $\beta$ is the vector of regression coefficients, and $ϵ$ is the residual vector distributing as a zero-mean multivariable Gaussian with a diagonal covariance matrix $\mathcal{N}\sim \left(0,{\sigma }^{2}{I}_{N}\right)$, where ${I}_{N}$ is the $N×N$ identity matrix. Therefore $y\sim \mathcal{N}\left(X\beta ,{\sigma }^{2}{I}_{N}\right)$, (2)meaning that linear combination $X\beta$ explains (or predicts) response y with uncertainty characterized by a variance of ${\sigma }^{2}$.
Assume y,$\beta$, and $ϵ\in {\mathbb{R}}^{\mathbb{n}}$ Under the model assumptions, we aim to estimate the unknown parameters ($\beta$ and ${\sigma }^{2}$) from the data available (X and y).
Maximum likelihood (ML) estimation is the most common estimator. We maximize the log-likelihood w.r.t. $\beta$ and ${\sigma }^{2}$ $\mathcal{L}\left(\beta ,{\sigma }^{2}|y,X\right)=-\frac{N}{2}\mathrm{log}2\pi -\frac{N}{2}log{\sigma }^{2}-\frac{1}{2{\sigma }^{2}}\left(y-X\beta {\right)}^{T}\left(y-X\beta \right)$
I am trying to understand that how the log-likelihood, $\mathcal{L}\left(\beta ,{\sigma }^{2}|y,X\right)$, is formed. Normally, I saw these problems when we have ${\mathbf{x}}_{\mathbf{i}}$ as vector of size d(d is number of parameter for each data). specifically, when xi is a vector, I wrote is as
$\mathrm{ln}\prod _{i=1}^{N}\frac{1}{\sqrt{\left(2\pi {\right)}^{d}{\mathbit{\sigma }}^{2}}}\mathrm{exp}\left(-\frac{1}{2{\sigma }^{2}}\left({\mathbf{x}}_{\mathbf{i}}-\mathbit{\mu }{\right)}^{\mathrm{T}}\mathbf{\left(}{\mathbf{x}}_{i}-\mathbit{\mu }\right)\right)=\sum {}_{i}\mathrm{ln}\frac{1}{\sqrt{\left(2\pi {\right)}^{d}{\mathbit{\sigma }}^{2}}}\mathrm{exp}\left(-\frac{1}{2{\sigma }^{2}}\left({\mathbf{x}}_{\mathbf{i}}-\mathbit{\mu }{\right)}^{\mathrm{T}}\mathbf{\left(}{\mathbf{x}}_{i}-\mathbit{\mu }\right)\right)$. But in the case that is shown in this tutorial, there is no index I to apply summation.
You can still ask an expert for help

• Questions are typically answered in as fast as 30 minutes

Solve your problem for the price of one coffee

• Math expert for every subject
• Pay only if we can solve it

Nelson Santana
I think it's relatively easy to get mixed up here due to notation. In the case you present from the textbook, they're considering a product of one-dimensional gaussians which are independent from each other, and then writing the form as a multi-dimensional gaussian (since then the covariance matrix of this multidimensional gaussian is exactly ${\sigma }^{2}I$). E.g. note that each sample has the form ${y}_{i}-\left(X\beta {\right)}_{i}\sim N\left(0,{\sigma }^{2}\right)$. Taking the product of these distributions yields the multi-dimensional gaussian above.
In your exposition, on the other hand, you're writing a multi-dimensional gaussian which is i.i.d.; this is different than what the textbook is referring to, since the mean should change between distributions (e.g. the samples are independent, but not identically drawn, since we're observing different data points with some additional noise, $\epsilon \sim N\left(0,{\sigma }^{2}\right)$).