2.1 Introduction
The generalised linear model extends the linear model defined in Section 1.1 to allow a more flexible family of probability distributions.
In a generalised linear model (GLM) the \(n\) observations of the response \(y=(y_1,y_2,\ldots ,y_n)^T\) are assumed to be observations of independent random variables \(Y=(Y_1,Y_2,\ldots ,Y_n)^T\), which take the same distribution from the exponential family. Hence, \[\begin{equation} f_{Y}(y;\theta,\phi)=\exp\left(\sum_{i=1}^n{{y_i\theta_i-b(\theta_i)}\over{\phi_i}} +\sum_{i=1}^nc(y_i,\phi_i)\right) \tag{2.1} \end{equation}\] where \(\theta=(\theta_1,\ldots ,\theta_n)^T\) is the collection of canonical parameters and \(\phi=(\phi_1,\ldots ,\phi_n)^T\) is the collection of dispersion parameters (where they exist). Commonly, the dispersion parameters are known up to, at most, a single common unknown \(\sigma^2\), and we write \(\phi_i=\sigma^2/m_i\) where the \(m_i\) represent known weights.
The distribution of the response variable \(Y_i\) depends on the explanatory data \(x_i=(1,x_{i1},x_{i2},\ldots ,x_{ip})^T\) through the linear predictor \(\eta_i\) where \[\begin{align*} \eta_i&=\beta_0+\beta_1 x_{i1} +\beta_2 x_{i2} +\ldots + \beta_p x_{ip}\\ &=\sum_{j=0}^p x_{ij} \beta_j\\ &= x_i^T\beta\\ &=[X\beta]_i,\qquad i=1,\ldots ,n \end{align*}\] in an exactly analagous fashion to the linear model in Section 1.1.
The link between the distribution of \(Y\) and the linear predictor \(\eta\) is provided by the link function \(g\), \[ \eta_i=g(\mu_i)\qquad i = 1, \ldots, n \] where \(\mu_i\equiv E(Y_i),\; i = 1, \ldots, n\). Hence, the dependence of the distribution of the response on the explanatory variables is established as \[ g(E[Y_i])=g(\mu_i)=\eta_i=x_i^T\beta\qquad i = 1, \ldots, n \]
In principle, the link function \(g\) can be any one-to-one differentiable function. However, we note that \(\eta_i\) can in principle take any value in \(\mathbb{R}\) (as we make no restriction on possible values taken by explanatory variables or model parameters). However, for some exponential family distributions \(\mu_i\) is restricted. For example, for the Poisson distribution \(\mu_i\in\mathbb{R}_+\); for the Bernoulli distribution \(\mu_i\in(0,1)\). If \(g\) is not chosen carefully, then there may exist a possible \(x_i\) and \(\beta\) such that \(\eta_i\ne g(\mu_i)\) for any possible value of \(\mu_i\). Therefore, most common choices of link function map the set of allowed values for \(\mu_i\) onto \(\mathbb{R}\).
Recall that for a random variable \(Y\) with a distribution from the exponential family, \(E(Y)=b'(\theta)\). Hence, for a generalised linear model \[ \mu_i=E(Y_i)=b'(\theta_i)\qquad i = 1, \ldots, n. \] Therefore \[ \theta_i=b'^{-1}(\mu_i)\qquad i = 1, \ldots, n \] and as \(g(\mu_i)=\eta_i=x_i^T\beta\), then \[\begin{equation} \theta_i=b'^{-1}(g^{-1}[x_i^T\beta])\qquad i = 1, \ldots, n. \tag{2.2} \end{equation}\] Hence, we can express the joint density (2.1) in terms of the coefficients \(\beta\), and for observed data \(y\), this is the likelihood \(f_{Y}(y;\beta,\phi)\) for \(\beta\).
Note that considerable simplification is obtained in (2.1) and (2.2) if the functions \(g\) and \(b'^{-1}\) are identical. Then \[ \theta_i=x_i^T\beta\qquad i = 1, \ldots, n. \] The link function \[ g(\mu)\equiv b'^{-1}(\mu) \] is called the canonical link function. Under the canonical link, the canonical parameter is equal to the linear predictor.
Distribution | Normal | Poisson | Bernoulli/Binomial |
---|---|---|---|
\(b(\theta)\) | \({1\over 2}\theta^2\) | \(\log(1+\exp\theta)\) | |
\(b'(\theta)\equiv\mu\) | \(\theta\) | \(\frac{\exp\theta}{1+\exp\theta}\) | |
\(b'^{-1}(\mu)\equiv\theta\) | \(\mu\) | \(\log{\frac{\mu}{1-\mu}}\) | |
Link | \(g(\mu)=\mu\) | \(g(\mu)=\log\mu\) | \(g(\mu)=\log{\frac{\mu}{1-\mu}}\) |
Name of link | Identity link | Log link | Logit link |
Exercise 5: Complete Table 2.1.
Clearly the linear model considered in Section 1 is also a generalised linear model where \(Y_1, \ldots, Y_n\) are independent normally distributed, the explanatory variables enter a linear model through the linear predictor \[ \eta_i=x_i^T\beta\qquad i = 1, \ldots, n. \] and the link between \(E(Y)=\mu\) and the linear predictor \(\eta\) is through the (canonical) identity link function \[ \mu_i=\eta_i\qquad i = 1, \ldots, n. \]