1.6 Comparing linear models

\( \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\corr}{corr} \newcommand{\indep}{\perp\!\!\!\perp} \newcommand{\nindep}{\perp\!\!\!\perp\!\!\!\!\!\!/\;\;} \)

It should be noted that this section describes just one method for comparing models. General principles and other methods will be discussed in detail in the APTS module itself.

A pair of nested linear models can be compared pairwise using a generalised likelihood ratio test. Nesting implies that the simpler model (\(H_0\)) is a special case of the more complex model (\(H_1\)). In practice, this usually means that the explanatory variables present in \(H_0\) are a subset of those present in \(H_1\). Let \(\Theta^{(1)}\) be the unrestricted parameter space under H\(_1\) and \(\Theta^{(0)}\) be the parameter space corresponding to model \(H_0\), i.e. with the appropriate coefficients constrained to zero.

Without loss of generality, we can think of \(H_1\) as the model \[ E(Y_i)=\sum_{j=0}^p x_{ij} \beta_j \qquad i = 1, \ldots, n \] with \(H_0\) being the same model with

Now, a generalised likelihood ratio test of \(H_0\) against \(H_1\) has a test statistic of the form \[ T={{\max_{(\beta,\sigma^2)\in \Theta^{(1)}}f_{Y}(y;\beta,\sigma^2)}\over {\max_{(\beta,\sigma^2)\in \Theta^{(0)}}f_{Y}(y;\beta,\sigma^2)}} \] and rejects \(H_0\) in favour of \(H_1\) when \(T>k\), where where \(k\) is determined by \(\alpha\), the size of the test.

For a linear model, \[ f_{Y}(y;\beta,\sigma^2)=\left(2\pi\sigma^2\right)^{-{n\over 2}} \exp\left(-{1\over{2\sigma^2}} \sum_{i=1}^n (y_i-x_i^T\beta)^2\right). \] This is maximised with respect to \((\beta,\sigma^2)\) at \(\beta=\hat{\beta}\) and \(\sigma^2=\hat{\sigma}^2=D/n\). Therefore \[\begin{align*} \max_{\beta,\sigma^2} f_{Y}(y;\beta,\sigma^2)&=(2\pi D/n)^{-{n\over 2}} \exp\left(-{n\over{2D}} \sum_{i=1}^n (y_i-x_i^T\hat{\beta})^2\right) \\ &=(2\pi D/n)^{-{n\over 2}} \exp\left(-{n\over2}\right) \end{align*}\]

Exercise 3: Let the deviances under models \(H_0\) and \(H_1\) be denoted \(D_0\) and \(D_1\) respectively. Show that the likelihood ratio test statistic \(T\) above can be written as \[T=\left(1+{{p-q}\over{n-p-1}}F\right)^{n\over 2}, \] where \[ F={{(D_0-D_1)/(p-q)}\over{D_1/(n-p-1)}}. \] Hence, the simpler model \(H_0\) is rejected in favour of the more complex model \(H_1\) if \(F\) is `too large’.

As we have required \(H_0\) to be nested in \(H_1\) then, under \(H_0\), \(F\) has an F distribution with \(p-q\) degrees of freedom in the numerator and \(n-p-1\) degrees of freedom in the denominator. To see this, note the analysis of variance decomposition \[ {{D_0}\over\sigma^2}={{D_0-D_1}\over\sigma^2}+{{D_1}\over\sigma^2}. \] We know (1.3) that, under \(H_0\), \(D_1/\sigma^2\) has a \(\chi^2_{n-p-1}\) distribution and \(D_0/\sigma^2\) has a \(\chi^2_{n-q}\) distribution. It is also true (although we do not show it here) that under \(H_0\), \((D_0-D_1)/\sigma^2\) and \(D_0/\sigma^2\) are independent. Therefore, from the properties of the chi-squared distribution, it follows that under \(H_0\), \((D_0-D_1)/\sigma^2\) has a \(\chi^2_{p-q}\) distribution, and \(F\) has a \(F_{p-q,\,n-p-1}\) distribution.

Therefore, \(H_0\) is rejected in favour of \(H_1\) when \(F>k\) where \(k\) is the \(100(1-\alpha)\%\) point of the \(F_{p-q,\,n-p-1}\) distribution.