\( \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\corr}{corr} \newcommand{\indep}{\perp\!\!\!\perp} \newcommand{\nindep}{\perp\!\!\!\perp\!\!\!\!\!\!/\;\;} \)

2.1 Generalised Linear Models

\(y_1,\ldots ,y_n\) are observations of response variables \(Y_1,\ldots ,Y_n\) assumed to be independently generated by a distribution of the same exponential family form, with means \(\mu_i\equiv E(Y_i)\) linked to explanatory variables \(X_1,X_2,\ldots ,X_p\) through \[g(\mu_i)=\eta_i\equiv\beta_0+\sum_{r=1}^p \beta_r x_{ir}\equiv x_i^T \beta\] GLMs have proved remarkably effective at modelling real world variation in a wide range of application areas. However, situations frequently arise where GLMs do not adequately describe observed data. This can be due to a number of reasons including:

  • The mean model cannot be appropriately specified as there is dependence on an unobserved (or unobservable) explanatory variable.
  • There is excess variability between experimental units beyond that implied by the mean/variance relationship of the chosen response distribution.
  • The assumption of independence is not appropriate.
  • Complex multivariate structure in the data requires a more flexible model class