\( \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\corr}{corr} \newcommand{\indep}{\perp\!\!\!\perp} \newcommand{\nindep}{\perp\!\!\!\perp\!\!\!\!\!\!/\;\;} \)

1.3 Estimation of \(\sigma^2\)

In addition to the linear coefficients \(\beta_0, \ldots, \beta_p\) estimated using least squares, we also need to estimate the error variance \(\sigma^2\), representing the variability of observations about their mean.

We can estimate \(\sigma^2\) using maximum likelihood. Maximising (1.4) with respect to \(\beta\) and \(\sigma^2\) gives \[ \hat{\sigma}^2={D\over n}={1\over n}\sum_{i=1}^n e_i^2. \] If the model is correct, then \(D\) is independent of \(\hat\beta\) and \[ {D\over\sigma^2}\sim\chi^2_{n-p-1} \] \[ \Rightarrow E(\hat{\sigma}^2)={{n-p-1}\over n}\sigma^2, \] so the maximum likelihood estimator is biased for \(\sigma^2\) (although still asymptotically unbiased as \({{n-p-1}\over n}\to 1\) as \(n\to\infty\)). We usually prefer to use the unbiased estimator of \(\sigma^2\) \[ s^2={D\over {n-p-1}}={1\over {n-p-1}}\sum_{i=1}^n e_i^2. \] The denominator \(n-p-1\), the number of observations minus the number of coefficients in the model is called the degrees of freedom of the model. Therefore, we estimate the error variance by the deviance divided by the degrees of freedom.