I am studying a Tutorial on Maximum Likelihood Estimation in Linear Regression and I have a question.

When we have more than one regressor (a.k.a. multiple linear regression1), the model comes in its matrix form $y=X\beta +\u03f5$, (1)where y is the response vector, X is the design matrix with each its row specifying under what design or conditions the corresponding response is observed (hence the name), $\beta $ is the vector of regression coefficients, and $\u03f5$ is the residual vector distributing as a zero-mean multivariable Gaussian with a diagonal covariance matrix $\mathcal{N}\sim (0,{\sigma}^{2}{I}_{N})$, where ${I}_{N}$ is the $N\times N$ identity matrix. Therefore $y\sim \mathcal{N}(X\beta ,{\sigma}^{2}{I}_{N})$, (2)meaning that linear combination $X\beta $ explains (or predicts) response y with uncertainty characterized by a variance of ${\sigma}^{2}$.

Assume y,$\beta $, and $\u03f5\in {\mathbb{R}}^{\mathbb{n}}$ Under the model assumptions, we aim to estimate the unknown parameters ($\beta $ and ${\sigma}^{2}$) from the data available (X and y).

Maximum likelihood (ML) estimation is the most common estimator. We maximize the log-likelihood w.r.t. $\beta $ and ${\sigma}^{2}$ $\mathcal{L}(\beta ,{\sigma}^{2}|y,X)=-\frac{N}{2}\mathrm{log}2\pi -\frac{N}{2}log{\sigma}^{2}-\frac{1}{2{\sigma}^{2}}(y-X\beta {)}^{T}(y-X\beta )$

I am trying to understand that how the log-likelihood, $\mathcal{L}(\beta ,{\sigma}^{2}|y,X)$, is formed. Normally, I saw these problems when we have ${\mathbf{x}}_{\mathbf{i}}$ as vector of size d(d is number of parameter for each data). specifically, when xi is a vector, I wrote is as

$\mathrm{ln}\prod _{i=1}^{N}\frac{1}{\sqrt{(2\pi {)}^{d}{\mathit{\sigma}}^{2}}}\mathrm{exp}(-\frac{1}{2{\sigma}^{2}}({\mathbf{x}}_{\mathbf{i}}-\mathit{\mu}{)}^{\mathrm{T}}\mathbf{(}{\mathbf{x}}_{i}-\mathit{\mu}))=\sum {}_{i}\mathrm{ln}\frac{1}{\sqrt{(2\pi {)}^{d}{\mathit{\sigma}}^{2}}}\mathrm{exp}(-\frac{1}{2{\sigma}^{2}}({\mathbf{x}}_{\mathbf{i}}-\mathit{\mu}{)}^{\mathrm{T}}\mathbf{(}{\mathbf{x}}_{i}-\mathit{\mu}))$. But in the case that is shown in this tutorial, there is no index I to apply summation.

When we have more than one regressor (a.k.a. multiple linear regression1), the model comes in its matrix form $y=X\beta +\u03f5$, (1)where y is the response vector, X is the design matrix with each its row specifying under what design or conditions the corresponding response is observed (hence the name), $\beta $ is the vector of regression coefficients, and $\u03f5$ is the residual vector distributing as a zero-mean multivariable Gaussian with a diagonal covariance matrix $\mathcal{N}\sim (0,{\sigma}^{2}{I}_{N})$, where ${I}_{N}$ is the $N\times N$ identity matrix. Therefore $y\sim \mathcal{N}(X\beta ,{\sigma}^{2}{I}_{N})$, (2)meaning that linear combination $X\beta $ explains (or predicts) response y with uncertainty characterized by a variance of ${\sigma}^{2}$.

Assume y,$\beta $, and $\u03f5\in {\mathbb{R}}^{\mathbb{n}}$ Under the model assumptions, we aim to estimate the unknown parameters ($\beta $ and ${\sigma}^{2}$) from the data available (X and y).

Maximum likelihood (ML) estimation is the most common estimator. We maximize the log-likelihood w.r.t. $\beta $ and ${\sigma}^{2}$ $\mathcal{L}(\beta ,{\sigma}^{2}|y,X)=-\frac{N}{2}\mathrm{log}2\pi -\frac{N}{2}log{\sigma}^{2}-\frac{1}{2{\sigma}^{2}}(y-X\beta {)}^{T}(y-X\beta )$

I am trying to understand that how the log-likelihood, $\mathcal{L}(\beta ,{\sigma}^{2}|y,X)$, is formed. Normally, I saw these problems when we have ${\mathbf{x}}_{\mathbf{i}}$ as vector of size d(d is number of parameter for each data). specifically, when xi is a vector, I wrote is as

$\mathrm{ln}\prod _{i=1}^{N}\frac{1}{\sqrt{(2\pi {)}^{d}{\mathit{\sigma}}^{2}}}\mathrm{exp}(-\frac{1}{2{\sigma}^{2}}({\mathbf{x}}_{\mathbf{i}}-\mathit{\mu}{)}^{\mathrm{T}}\mathbf{(}{\mathbf{x}}_{i}-\mathit{\mu}))=\sum {}_{i}\mathrm{ln}\frac{1}{\sqrt{(2\pi {)}^{d}{\mathit{\sigma}}^{2}}}\mathrm{exp}(-\frac{1}{2{\sigma}^{2}}({\mathbf{x}}_{\mathbf{i}}-\mathit{\mu}{)}^{\mathrm{T}}\mathbf{(}{\mathbf{x}}_{i}-\mathit{\mu}))$. But in the case that is shown in this tutorial, there is no index I to apply summation.