2.1 Generalised Linear Models
\(y_1,\ldots ,y_n\) are observations of response variables \(Y_1,\ldots ,Y_n\) assumed to be independently generated by a distribution of the same exponential family form, with means \(\mu_i\equiv E(Y_i)\) linked to explanatory variables \(X_1,X_2,\ldots ,X_p\) through \[g(\mu_i)=\eta_i\equiv\beta_0+\sum_{r=1}^p \beta_r x_{ir}\equiv x_i^T \beta\] GLMs have proved remarkably effective at modelling real world variation in a wide range of application areas. However, situations frequently arise where GLMs do not adequately describe observed data. This can be due to a number of reasons including:
- The mean model cannot be appropriately specified as there is dependence on an unobserved (or unobservable) explanatory variable.
- There is excess variability between experimental units beyond that implied by the mean/variance relationship of the chosen response distribution.
- The assumption of independence is not appropriate.
- Complex multivariate structure in the data requires a more flexible model class