1.1 Introduction

$ \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\corr}{corr} \newcommand{\indep}{\perp\!\!\!\perp} \newcommand{\nindep}{\perp\!\!\!\perp\!\!\!\!\!\!/\;\;} $

In practical applications, we often distinguish between a response variable and a group of explanatory variables. The aim is to determine the pattern of dependence of the response variable on the explanatory variables. We denote the $n$ observations of the response variable by $y=(y_1,y_2,\ldots ,y_n)^T$. In a statistical model, these are assumed to be observations of random variables $Y=(Y_1,Y_2,\ldots ,Y_n)^T$. Associated with each $y_i$ is a vector $x_i=(1,x_{i1},x_{i2},\ldots ,x_{ip})^T$ of values of $p$ explanatory variables.

Linear models are those for which the relationship between the response and explanatory variables is of the form \[\begin{align} E(Y_i)&=\beta_0+\beta_1 x_{i1} +\beta_2 x_{i2} +\ldots + \beta_p x_{ip} \notag \\ &=\sum_{j=0}^p x_{ij} \beta_j\quad \text{(where we define $x_{i0}\equiv 1$)} \notag \\ &= x_i^T\beta \notag \\ &=[X\beta]_i,\qquad i=1,\ldots ,n \tag{1.1} \end{align}\] where \[ E(Y)=\begin{pmatrix} E(Y_1) \\ \vdots \\ E(Y_n) \end{pmatrix}, \qquad X=\begin{pmatrix} x_1^T \\ \vdots \\ x_n^T \end{pmatrix} =\begin{pmatrix} 1&x_{11}&\cdots&x_{1p} \\ \vdots&\vdots&\ddots&\vdots \\ 1&x_{n1}&\cdots&x_{np} \end{pmatrix} \] and $\beta=(\beta_0,\beta_1,\ldots ,\beta_p)^T$ is a vector of fixed but unknown parameters describing the dependence of $Y_i$ on $x_i$. The four ways of describing the linear model in (1.1) are equivalent, but the most ecoonomical is the matrix form \[\begin{equation} E(Y)=X\beta. \tag{1.2} \end{equation}\]

The $n\times (p+1)$ matrix $X$ consists of known (observed) constants and is called the design matrix . The $i$th row of $X$ is $x_i^T$, the explanatory data corresponding to the $i$th observation of the response. The $j$th column of $X$ contains the $n$ observations of the $j$th explanatory variable.

Example 1.1

The null model \[ E(Y_i)=\beta_0, \qquad i = 1, \ldots, n \] \[ X= \begin{pmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix}, \quad \beta= \begin{pmatrix} \beta_0 \\ \beta_1 \end{pmatrix} \]

Example 1.2

Simple linear regression \[ E(Y_i)=\beta_0+\beta_1 x_i\qquad i = 1, \ldots, n \] \[ X=\begin{pmatrix}1&x_1\\ 1&x_2\\ \vdots&\vdots\\ 1&x_n \end{pmatrix}, \qquad \beta=\begin{pmatrix}\beta_0\\ \beta_1 \end{pmatrix}. \]

Example 1.3

Polynomial regression \[ E(Y_i)=\beta_0+\beta_1 x_i+\beta_2 x_i^2 +\ldots +\beta_p x_i^{p}\qquad i = 1, \ldots, n \] \[ X=\begin{pmatrix} 1&x_1&x_1^2&\cdots&x_1^{p}\\ 1&x_2&x_2^2&\cdots&x_2^{p}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_n&x_n^2&\cdots&x_n^{p} \end{pmatrix}, \qquad \beta=\begin{pmatrix}\beta_0\\ \beta_1\\ \vdots\\ \beta_p \end{pmatrix}. \]

Example 1.4

Multiple regression \[ E(Y_i)=\beta_0+\beta_1 x_{i1}+\beta_2 x_{i2} +\ldots +\beta_p x_{i\,p}\qquad i = 1, \ldots, n \] \[ X=\begin{pmatrix} 1&x_{11}&x_{12}&\cdots&x_{1p}\\ 1&x_{21}&x_{22}&\cdots&x_{2p}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_{n1}&x_{n2}&\cdots&x_{np} \end{pmatrix}, \qquad \beta=\begin{pmatrix}\beta_0\\ \beta_1\\ \vdots\\ \beta_p \end{pmatrix}. \]

Strictly, the only requirement for a model to be linear is that the relationship between the response variables, $Y$, and any explanatory variables can be written in the form (1.2). No further specification of the joint distribution of $Y_1, \ldots, Y_n$ is required. However, the linear model is more useful for statistical analysis if we can make three further assumptions:

$Y_1, \ldots, Y_n$ are independent random variables.
$Y_1, \ldots, Y_n$ are normally distributed.
$\var(Y_1)=\var(Y_2)=\cdots =\var(Y_n)$ ($Y_1, \ldots, Y_n$ are homoscedastic). We denote this common variance by $\sigma^2$

With these assumptions the linear model completely specifies the distribution of $Y$, in that $Y_1, \ldots, Y_n$ are independent and \[ Y_i\sim N\left(x_i^T\beta\; ,\;\sigma^2\right)\qquad i = 1, \ldots, n . \] Another way of writing this is \[ Y_i=x_i^T\beta\;+\;\epsilon_i\qquad i = 1, \ldots, n \] where $\epsilon_1, \ldots, \epsilon_n$ are i.i.d. N$(0,\sigma^2)$ random variables.

A linear model can now be expressed in matrix form as \[\begin{equation} Y=X\beta+\epsilon \tag{1.3} \end{equation}\] where $\epsilon=(\epsilon_1, \ldots, \epsilon_n)^T$ has a multivariate normal distribution with mean vector $0$ and variance covariance matrix $\sigma^2I$, (because all $\var(\epsilon_i)=\sigma^2$ and $\epsilon_1, \ldots, \epsilon_n$ are independent implies all $\cov(\epsilon_i,\epsilon_j)=0$). It follows from (1.3) that the distribution of $Y$ is multivariate normal with mean vector $X\beta$ and variance covariance matrix $\sigma^2I_n$, i.e. $Y\sim$ N$(X\beta,\sigma^2I_n)$.