1.6 Comparing linear models
It should be noted that this section describes just one method for comparing models. General principles and other methods will be discussed in detail in the APTS module itself.
A pair of nested linear models can be compared pairwise using a generalised likelihood ratio test. Nesting implies that the simpler model (\(H_0\)) is a special case of the more complex model (\(H_1\)). In practice, this usually means that the explanatory variables present in \(H_0\) are a subset of those present in \(H_1\). Let \(\Theta^{(1)}\) be the unrestricted parameter space under H\(_1\) and \(\Theta^{(0)}\) be the parameter space corresponding to model \(H_0\), i.e. with the appropriate coefficients constrained to zero.
Without loss of generality, we can think of \(H_1\) as the model \[ E(Y_i)=\sum_{j=0}^p x_{ij} \beta_j \qquad i = 1, \ldots, n \] with \(H_0\) being the same model withNow, a generalised likelihood ratio test of \(H_0\) against \(H_1\) has a test statistic of the form \[ T={{\max_{(\beta,\sigma^2)\in \Theta^{(1)}}f_{Y}(y;\beta,\sigma^2)}\over {\max_{(\beta,\sigma^2)\in \Theta^{(0)}}f_{Y}(y;\beta,\sigma^2)}} \] and rejects \(H_0\) in favour of \(H_1\) when \(T>k\), where where \(k\) is determined by \(\alpha\), the size of the test.
For a linear model, \[ f_{Y}(y;\beta,\sigma^2)=\left(2\pi\sigma^2\right)^{-{n\over 2}} \exp\left(-{1\over{2\sigma^2}} \sum_{i=1}^n (y_i-x_i^T\beta)^2\right). \] This is maximised with respect to \((\beta,\sigma^2)\) at \(\beta=\hat{\beta}\) and \(\sigma^2=\hat{\sigma}^2=D/n\). Therefore \[\begin{align*} \max_{\beta,\sigma^2} f_{Y}(y;\beta,\sigma^2)&=(2\pi D/n)^{-{n\over 2}} \exp\left(-{n\over{2D}} \sum_{i=1}^n (y_i-x_i^T\hat{\beta})^2\right) \\ &=(2\pi D/n)^{-{n\over 2}} \exp\left(-{n\over2}\right) \end{align*}\]
Exercise 3: Let the deviances under models \(H_0\) and \(H_1\) be denoted \(D_0\) and \(D_1\) respectively. Show that the likelihood ratio test statistic \(T\) above can be written as \[T=\left(1+{{p-q}\over{n-p-1}}F\right)^{n\over 2}, \] where \[ F={{(D_0-D_1)/(p-q)}\over{D_1/(n-p-1)}}. \] Hence, the simpler model \(H_0\) is rejected in favour of the more complex model \(H_1\) if \(F\) is `too large’.
As we have required \(H_0\) to be nested in \(H_1\) then, under \(H_0\), \(F\) has an F distribution with \(p-q\) degrees of freedom in the numerator and \(n-p-1\) degrees of freedom in the denominator. To see this, note the analysis of variance decomposition \[ {{D_0}\over\sigma^2}={{D_0-D_1}\over\sigma^2}+{{D_1}\over\sigma^2}. \] We know (1.3) that, under \(H_0\), \(D_1/\sigma^2\) has a \(\chi^2_{n-p-1}\) distribution and \(D_0/\sigma^2\) has a \(\chi^2_{n-q}\) distribution. It is also true (although we do not show it here) that under \(H_0\), \((D_0-D_1)/\sigma^2\) and \(D_0/\sigma^2\) are independent. Therefore, from the properties of the chi-squared distribution, it follows that under \(H_0\), \((D_0-D_1)/\sigma^2\) has a \(\chi^2_{p-q}\) distribution, and \(F\) has a \(F_{p-q,\,n-p-1}\) distribution.
Therefore, \(H_0\) is rejected in favour of \(H_1\) when \(F>k\) where \(k\) is the \(100(1-\alpha)\%\) point of the \(F_{p-q,\,n-p-1}\) distribution.