\( \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\corr}{corr} \newcommand{\indep}{\perp\!\!\!\perp} \newcommand{\nindep}{\perp\!\!\!\perp\!\!\!\!\!\!/\;\;} \)

1.1 Introduction

In practical applications, we often distinguish between a response variable and a group of explanatory variables. The aim is to determine the pattern of dependence of the response variable on the explanatory variables. We denote the \(n\) observations of the response variable by \(y=(y_1,y_2,\ldots ,y_n)^T\). In a statistical model, these are assumed to be observations of random variables \(Y=(Y_1,Y_2,\ldots ,Y_n)^T\). Associated with each \(y_i\) is a vector \(x_i=(1,x_{i1},x_{i2},\ldots ,x_{ip})^T\) of values of \(p\) explanatory variables.

Linear models are those for which the relationship between the response and explanatory variables is of the form \[\begin{align} E(Y_i)&=\beta_0+\beta_1 x_{i1} +\beta_2 x_{i2} +\ldots + \beta_p x_{ip} \notag \\ &=\sum_{j=0}^p x_{ij} \beta_j\quad \text{(where we define $x_{i0}\equiv 1$)} \notag \\ &= x_i^T\beta \notag \\ &=[X\beta]_i,\qquad i=1,\ldots ,n \tag{1.1} \end{align}\] where \[ E(Y)=\begin{pmatrix} E(Y_1) \\ \vdots \\ E(Y_n) \end{pmatrix}, \qquad X=\begin{pmatrix} x_1^T \\ \vdots \\ x_n^T \end{pmatrix} =\begin{pmatrix} 1&x_{11}&\cdots&x_{1p} \\ \vdots&\vdots&\ddots&\vdots \\ 1&x_{n1}&\cdots&x_{np} \end{pmatrix} \] and \(\beta=(\beta_0,\beta_1,\ldots ,\beta_p)^T\) is a vector of fixed but unknown parameters describing the dependence of \(Y_i\) on \(x_i\). The four ways of describing the linear model in (1.1) are equivalent, but the most ecoonomical is the matrix form \[\begin{equation} E(Y)=X\beta. \tag{1.2} \end{equation}\]

The \(n\times (p+1)\) matrix \(X\) consists of known (observed) constants and is called the design matrix . The \(i\)th row of \(X\) is \(x_i^T\), the explanatory data corresponding to the \(i\)th observation of the response. The \(j\)th column of \(X\) contains the \(n\) observations of the \(j\)th explanatory variable.

Example 1.1

The null model \[ E(Y_i)=\beta_0, \qquad i = 1, \ldots, n \] \[ X= \begin{pmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix}, \quad \beta= \begin{pmatrix} \beta_0 \\ \beta_1 \end{pmatrix} \]

Example 1.2

Simple linear regression \[ E(Y_i)=\beta_0+\beta_1 x_i\qquad i = 1, \ldots, n \] \[ X=\begin{pmatrix}1&x_1\\ 1&x_2\\ \vdots&\vdots\\ 1&x_n \end{pmatrix}, \qquad \beta=\begin{pmatrix}\beta_0\\ \beta_1 \end{pmatrix}. \]

Example 1.3

Polynomial regression \[ E(Y_i)=\beta_0+\beta_1 x_i+\beta_2 x_i^2 +\ldots +\beta_p x_i^{p}\qquad i = 1, \ldots, n \] \[ X=\begin{pmatrix} 1&x_1&x_1^2&\cdots&x_1^{p}\\ 1&x_2&x_2^2&\cdots&x_2^{p}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_n&x_n^2&\cdots&x_n^{p} \end{pmatrix}, \qquad \beta=\begin{pmatrix}\beta_0\\ \beta_1\\ \vdots\\ \beta_p \end{pmatrix}. \]

Example 1.4

Multiple regression \[ E(Y_i)=\beta_0+\beta_1 x_{i1}+\beta_2 x_{i2} +\ldots +\beta_p x_{i\,p}\qquad i = 1, \ldots, n \] \[ X=\begin{pmatrix} 1&x_{11}&x_{12}&\cdots&x_{1p}\\ 1&x_{21}&x_{22}&\cdots&x_{2p}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_{n1}&x_{n2}&\cdots&x_{np} \end{pmatrix}, \qquad \beta=\begin{pmatrix}\beta_0\\ \beta_1\\ \vdots\\ \beta_p \end{pmatrix}. \]

Strictly, the only requirement for a model to be linear is that the relationship between the response variables, \(Y\), and any explanatory variables can be written in the form (1.2). No further specification of the joint distribution of \(Y_1, \ldots, Y_n\) is required. However, the linear model is more useful for statistical analysis if we can make three further assumptions:

  1. \(Y_1, \ldots, Y_n\) are independent random variables.
  2. \(Y_1, \ldots, Y_n\) are normally distributed.
  3. \(\var(Y_1)=\var(Y_2)=\cdots =\var(Y_n)\) (\(Y_1, \ldots, Y_n\) are homoscedastic). We denote this common variance by \(\sigma^2\)

With these assumptions the linear model completely specifies the distribution of \(Y\), in that \(Y_1, \ldots, Y_n\) are independent and \[ Y_i\sim N\left(x_i^T\beta\; ,\;\sigma^2\right)\qquad i = 1, \ldots, n . \] Another way of writing this is \[ Y_i=x_i^T\beta\;+\;\epsilon_i\qquad i = 1, \ldots, n \] where \(\epsilon_1, \ldots, \epsilon_n\) are i.i.d. N\((0,\sigma^2)\) random variables.

A linear model can now be expressed in matrix form as \[\begin{equation} Y=X\beta+\epsilon \tag{1.3} \end{equation}\] where \(\epsilon=(\epsilon_1, \ldots, \epsilon_n)^T\) has a multivariate normal distribution with mean vector \(0\) and variance covariance matrix \(\sigma^2I\), (because all \(\var(\epsilon_i)=\sigma^2\) and \(\epsilon_1, \ldots, \epsilon_n\) are independent implies all \(\cov(\epsilon_i,\epsilon_j)=0\)). It follows from (1.3) that the distribution of \(Y\) is multivariate normal with mean vector \(X\beta\) and variance covariance matrix \(\sigma^2I_n\), i.e. \(Y\sim\) N\((X\beta,\sigma^2I_n)\).