1.8 Bayesian inference for linear models

\( \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\corr}{corr} \newcommand{\indep}{\perp\!\!\!\perp} \newcommand{\nindep}{\perp\!\!\!\perp\!\!\!\!\!\!/\;\;} \)

Bayesian inference for the parameters \((\beta,\sigma^2)\) of a linear model requires computation of the posterior density. Bayes theorem gives us \[ f(\beta,\sigma^2 \mid y) \propto f(y \mid \beta,\sigma^2)\,f(\beta,\sigma^2) \] where the likelihood \(f(y \mid \beta, \sigma^2)\) is given by (1.4) as \[ f(y \mid \beta,\sigma^2)=\left(2\pi\sigma^2\right)^{-{n\over 2}} \exp\left(-{1\over{2\sigma^2}} \sum_{i=1}^n (y_i-x_i^T\beta)^2\right). \]

Posterior computation is straightforward if the prior density \(f(\beta,\sigma^2)\) is conjugate to the likelihood, which, for a linear model, is achieved by the prior decomposition \[ \sigma^{-2}\;\sim\; {\rm Gamma}(a_0,b_0),\qquad \beta\,|\,\sigma^2\;\sim\; N(\mu_0, \sigma^2V_0) \] where \((a_0,b_0,\mu_0,V_0)\) are hyperparameters, whose values are chosen to reflect prior uncertainty about the linear model parameters \((\beta,\sigma^2)\).

With this prior structure, the corresponding posterior distributions are given by \[ \sigma^{-2}\;\sim\; {\rm Gamma}(a_0+n/2,b),\qquad \beta\,|\,\sigma^2\;\sim\; N(\mu, \sigma^2V) \] where \(V=(X^TX+V_0^{-1})^{-1}\), \(\mu=V(X^Ty+V_0^{-1}\mu_0)\) and

\[\begin{align*} b &= b_0+{1\over 2}\left( y^Ty + \mu_0V_0^{-1}\mu_0 - \mu V^{-1}\mu \right)\\ &= b_0+{1\over 2}\left\{ [n-p-1]s^2+\left[\mu_0-\hat\beta\right]^T\left [V_0+(X^TX)^{-1}\right]^{-1} \left[\mu_0-\hat\beta\right] \right\} \end{align*}\] if \(X^TX\) is non-singular, where \(\hat\beta\) and \(s^2\) are the classical unbiased estimators given above.

In applications where prior information about the model parameters \((\beta,\sigma^2)\) is weak, it is conventional to use the vague prior specification given by the improper prior density \[ f(\beta,\sigma^2)\;\propto\;\sigma^{-2}. \] This corresponds to the conjugate prior above with \(a_0=-(p+1)\), \(b_0=0\) and \(V_0^{-1}=0\).

Exercise 4: Show that, for this choice of hyperparameters, the posterior mean of \(\beta\) is given by the least squares estimator \(\hat\beta\). Show also that, a posteriori, \(1/\sigma^2\) has the distribution of \(X^2/[s^2(n-p-1)]\) where \(X^2\) has a \(\chi^2_{n-p-1}\) distribution. Hence show that posterior probability intervals for \(\sigma^2\) are equivalent to confidence intervals based on the sampling distribution of \(s^2\).

For a longer exercise, show that \((\beta-\mu)/\sigma\) has a multivariate normal posterior marginal distribution, independent of \(\sigma^2\), and hence that posterior probability intervals for a coefficient \(\beta_k\) are equivalent to the confidence intervals based on the sampling distribution of \((\hat\beta_k-\beta_k)/s.e.(\hat\beta_k)\) derived in Section 1.4 above.