\( \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \)

Chapter 11 Confidence intervals and hypothesis testing

11.1 Expressing uncertainty in parameter estimates

Suppose that \(Y_1, \ldots, Y_n\) are independent identically distributed random variables, each with a distribution depending on unknown parameter \(\theta\), and let \(y_1, \ldots, y_n\) be corresponding observed values.

In Chapter 10, we have seen how to find an estimate of \(\theta\) given \(y_1, \ldots, y_n\). However, an important part of statistical inference is to express our level of uncertainty in this estimate. This could be done by writing down an interval containing a range of values of \(\theta\) which could plausibly have generated the data. Alternatively, we might be interested in testing if some particular value of \(\theta\) could plausibly have generated the observed data.

11.2 Confidence intervals

Write \(\bm Y = (Y_1, \ldots, Y_n)^T\) and \(\bm y = (y_1, \ldots, y_n)^T\).

Definition 11.1 A \(100(1- \alpha)\%\) confidence interval for \(\theta\) is an interval \([L(\bm y), U(\bm y)]\) such that \[P\left(L(\bm Y) \leq \theta \leq U(\bm Y)\right) = 1 - \alpha.\]

Often, we take \(\alpha = 0.05\), and obtain a \(95\%\) confidence interval. This means that if we were to generate a large number of datasets from the model with some fixed value of \(\theta\), and find a \(95\%\) confidence interval for each dataset, approximately \(95\%\) of those intervals would contain the true value of \(\theta\).

Example 11.1 (Normal mean, known variance) Suppose that \(Y_1, \ldots, Y_n\) are independent and identically distributed, with each \(Y_i \sim N(\mu, \sigma^2)\), where \(\mu\) is an unknown parameter, but the value of \(\sigma^2\) is known. Suppose that we require a \(95 \%\) confidence interval for \(\mu\).

We know (by Proposition 4.1) that \(\bar Y \sim N(\mu, \sigma^2 / n)\), so \(\frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \sim N(0, 1)\), so \[P\left(z_{0.025} \leq \frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \leq z_{0.975}\right) = 0.95.\] where \(z_{p}\) is the \(p\)-quantile of the standard normal distribution, so that \(P(X \leq z_p) = p\) if \(X \sim N(0, 1)\). We can find these quantiles in R:

qnorm(0.025)

## [1] -1.959964

qnorm(0.975)

## [1] 1.959964

Notice that \(z_{p} = - z_{1 - p}\), because the standard normal distribution is symmetric about zero.

By “making \(\mu\) the subject of the inequality” we can rearrange this probability statement to give \[P\left(\bar Y - \frac{1.96 \sigma}{\sqrt{n}} \leq \mu \leq \bar Y + \frac{1.96 \sigma}{\sqrt{n}} \right) = 0.95.\] If we replace the end-points of the inequality with their sample equivalents, we get a \(95 \%\) confidence interval \[\left[\bar y - \frac{1.96 \sigma}{\sqrt{n}}, \bar y + \frac{1.96 \sigma}{\sqrt{n}} \right]\] for \(\mu\).

Example 11.2 (Normal mean, unknown variance) Suppose that \(Y_1, \ldots, Y_n\) are independent and identically distributed, with each \(Y_i \sim N(\mu, \sigma^2)\), where \(\mu\) and \(\sigma^2\) are both unknown parameters. Suppose that we require a \(95 \%\) confidence interval for \(\mu\).

We estimate \(\sigma^2\) by \(S^2 = \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar Y)^2\) and then proceed as in Example 11.1, taking care to replace the normal quantile with a corresponding quantile from the relevant \(t\) distribution, as we know (by Proposition 9.4) that \[\frac{\sqrt{n}(\bar Y - \mu)}{S} \sim t_{n-1}.\] We have \[P(t_{n-1, 0.025} \leq \frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \leq t_{n-1, 0.975}) = 0.95.\] where \(t_{n-1, p}\) is the \(p\)-quantile of the \(t_{n-1}\) distribution, so that \(P(X \leq t_{n-1, p}) = p\) if \(X \sim t_{n-1}\). We can find these quantiles in R, e.g. if \(n = 10\):

qt(0.025, df = 9)

## [1] -2.262157

qt(0.975, df = 9)

## [1] 2.262157

Again, \(t_{p, n-1} = - t_{1 - p, n-1}\), because the \(t\) distribution is symmetric about zero. So we have \[P\left(-t_{n-1, 0.975} \leq \frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \leq t_{n-1, 0.975}\right) = 0.95.\] and rearranging this to “make \(\mu\) the subject” gives \[P\left(\bar Y - \frac{t_{n-1, 0.975} S}{\sqrt{n}} \leq \mu \leq \bar Y + \frac{t_{n-1, 0.975} S}{\sqrt{n}} \right) = 0.95.\] Replacing the end points with their sample versions \[\left[\bar y - \frac{t_{n-1, 0.975} s}{\sqrt{n}}, \bar y + \frac{t_{n-1, 0.975} s}{\sqrt{n}} \right]\] is a \(95\%\) confidence interval for \(\mu\).

We always have \(t_{n-1, 0.975} > z_{0.975} = 1.96\), so the confidence interval when \(\sigma^2\) is unknown will be wider than it would be if \(\sigma^2\) was known: this makes sense, as we have to account for some additional uncertainty. For large \(n\), \(t_{n-1, 0.975} \approx 1.96\), as the \(t_{n-1}\) approaches a standard normal distribution as \(n\) increases.

Example 11.3 (Normal variance) Suppose that \(Y_1, \ldots, Y_n\) are independent and identically distributed, with each \(Y_i \sim N(\mu, \sigma^2)\), where \(\mu\) and \(\sigma^2\) are both unknown parameters. Suppose that we require a \(95 \%\) confidence interval for \(\sigma^2\).

We may estimate \(\sigma^2\) by \(S^2\). By Theorem 6.1, we know \[\frac{n-1}{\sigma^2} S^2 \sim \chi^2_{n-1},\] so \[P\left(c_{n-1, 0.025} \leq \frac{n-1}{\sigma^2} S^2 \leq c_{n-1, 0.975}\right) = 0.95,\] where \(c_{n-1, p}\) is the \(p\)-quantile of the \(\chi^2_{n-1}\) distribution, so that \(P(X \leq c_{n-1, p}) = p\) if \(X \sim \chi^2_{n-1}\). We can find these quantiles in R, e.g. if \(n = 10\):

qchisq(0.025, df = 9)

## [1] 2.700389

qchisq(0.975, df = 9)

## [1] 19.02277

Since the chi-squared distribution has positive domain, all quantiles are positive, and \(c_{p, n-1} \not = - c_{1 - p, n-1}\). Rearranging to “make \(\sigma^2\) the subject” gives \[P\left(\frac{(n-1) S^2}{c_{n-1, 0.975}} \leq \sigma^2 \leq \frac{(n-1) S^2}{c_{n-1, 0.025}}\right) = 0.95\] so \[\left[\frac{(n-1) s^2}{c_{n-1, 0.975}}, \frac{(n-1) s^2}{c_{n-1, 0.025}}\right]\] is a \(95\%\) confidence interval for \(\sigma^2\).

From this we can obtain a corresponding confidence interval for \(\sigma\) if we prefer, as \[P\left(\frac{\sqrt{n-1} S}{\sqrt{c_{n-1, 0.975}}} \leq \sigma \leq \frac{\sqrt{n-1} S}{\sqrt{c_{n-1, 0.025}}}\right) = 0.95\] so \[\left[\frac{\sqrt{n-1} s}{\sqrt{c_{n-1, 0.975}}}, \frac{\sqrt{n-1} s}{\sqrt{c_{n-1, 0.025}}}\right]\] is a \(95\%\) confidence interval for \(\sigma\).

All of our examples of confidence intervals have been for unknown parameters of a normal distribution. We have been able to find these confidence intervals because of the results we have proved earlier about the distributions of \(\bar Y\) and \(S^2\), which are natural estimators of \(\mu\) and \(\sigma^2\). For other distributions, the distribution of an estimator of \(\theta\) (such as the maximum likelihood estimator \(\hat \theta\)) may be more complicated, which makes constructing confidence intervals more challenging. It turns out that for large \(n\), the distribution of the maximum likelihood estimator is close to a normal distribution (see MATH3044), which is very useful in practice to construct confidence intervals for a wide range of models.

11.3 Hypothesis testing

In classical hypothesis testing we aim to choose between two competing hypotheses about a parameter of interest. The hypotheses are called the null hypothesis (\(H_0\)) and the alternative hypothesis (\(H_1\)). The null and alternative hypotheses are regarded somewhat differently – the null hypothesis will be rejected in favour of the alternative hypothesis only if there is strong evidence against it.

For now we will only consider a simple null hypothesis, which is one which specifies a single value for the parameter of interest.

We either reject \(H_0\) in favour of \(H_1\), or we do not reject \(H_0\). There are two types of error we can make:

We reject \(H_0\) when \(H_0\) is true (Type I error)
We do not reject \(H_0\) when \(H_1\) is true (Type II error)

In classical hypothesis testing we choose the Type I error probability we work at (the significance level) and design our test so that the Type II error probability is suitably small (i.e. choose a large enough sample size \(n\).)

In any given situation we need a test statistic – a quantity whose distribution is known when \(H_0\) is true. We reject \(H_0\) if the value of the test statistic is “extreme” relative to the distribution of the test statistic under \(H_0\), otherwise we do not reject \(H_0\). The threshold for what counts as “extreme” depends on the specified significance level of the test.

Example 11.4 (Normal mean, known variance) As in Example 11.1, suppose that \(Y_1, \ldots, Y_n\) are independent and identically distributed, with each \(Y_i \sim N(\mu, \sigma^2)\), where \(\mu\) is an unknown parameter, but the value of \(\sigma^2\) is known.

Suppose that we wish to test that null hypothesis \(H_0: \mu = \mu_0\), where \(\mu_0\) is a fixed value chosen in advance of collecting the data. We take \(H_1\) to be the complement of \(H_0\), so \(H_1: \mu \not = \mu_0\).

We construct the test statistic \[Z = \frac{\sqrt{n}(\bar Y - \mu_0)}{\sigma}.\] If \(H_0\) is true then \(Z \sim N(0, 1)\). For our observed data the observed value of \(Z\) is \[z = \frac{\sqrt{n}(\bar y - \mu_0)}{\sigma}.\] To construct a hypothesis test at significance level \(\alpha = 0.05\) we should only reject \(H_0\) if \(|z| > z_{0.975} = 1.96\). So if \(z > 1.96\) or \(z < -1.96\), we reject \(H_0\). Otherwise, we do not reject \(H_0\).

Example 11.5 (Normal mean, unknown variance) As in Example 11.2, suppose that \(Y_1, \ldots, Y_n\) are independent and identically distributed, with each \(Y_i \sim N(\mu, \sigma^2)\), where \(\mu\) and \(\sigma^2\) are both unknown parameters. Suppose we are interested in testing the null \(H_0: \mu = \mu_0\) against the alternative \(H_1: \mu \not = \mu_0\).

We estimate \(\sigma^2\) by \(S^2\), and construct the test statistic \[T = \frac{\sqrt{n}(\bar Y - \mu_0)}{S}.\] If \(H_0\) is true then \(T \sim t_{n-1}\). For our observed data the observed value of \(T\) is \[t = \frac{\sqrt{n}(\bar y - \mu_0)}{s}.\] To construct a hypothesis test at significance level \(\alpha = 0.05\) we should only reject \(H_0\) if \(|t| > t_{n-1, 0.975}\). So if \(t > t_{n-1, 0.975}\) or \(t < -t_{n-1, 0.975}\), we reject \(H_0\). Otherwise, we do not reject \(H_0\).

Example 11.6 (Normal variance) As in Example 11.3, suppose that \(Y_1, \ldots, Y_n\) are independent and identically distributed, with each \(Y_i \sim N(\mu, \sigma^2)\), where \(\mu\) and \(\sigma^2\) are both unknown parameters. Suppose we are interested in testing the null \(H_0: \sigma^2 = \sigma^2_0\) against the alternative \(H_1: \sigma^2 \not = \sigma^2_0\).

We estimate \(\sigma^2\) by \(S^2\), and construct the test statistic \[C = \frac{(n-1) S^2}{\sigma_0^2}.\] If \(H_0\) is true then \(C \sim \chi^2_{n-1}\). For our observed data the observed value of \(C\) is \[c = \frac{(n-1) s^2}{\sigma_0^2}.\] To construct a hypothesis test at significance level \(\alpha = 0.05\) we should only reject \(H_0\) if \(c < c_{n-1, 0.025}\) or \(c > c_{n-1, 0.975}\). Otherwise, we do not reject \(H_0\).

11.4 Two-sample hypothesis testing

In many practical situations, such as clinical trials, we have two independent groups of subjects under study and wish to understand the relative effects of two “treatments” on some response of interest. For instance, in a classical two-armed clinical trial, there are two treatments of interest, such as an active treatment and a placebo, or old and new treatments, and we wish to know whether there is a difference in responses between the two treatment groups.

Example 11.7 (Classical two-sample \(t\)-test) Suppose we have two sets of samples \[X_1, \ldots, X_{m} \sim N(\mu_1, \sigma^2) \quad \text{and} \quad Y_1, \ldots, Y_{n} \sim N(\mu_2, \sigma^2),\] where all the random variables are independent.

Write \(\delta = \mu_1 - \mu_2\). We would like to test the null \(H_0: \delta = \delta_0\), for some prespecified value of the difference between treatments, against the alternative \(H_1: \delta \not = \delta_0\)). Often \(\delta_0 = 0\), in which case we are testing if there is any difference in distribution of the response between the two treatment groups.

We know that \(\bar X \sim N(\mu_1, \sigma^2/m)\) and \(\bar Y \sim N(\mu_2, \sigma^2/n)\), and \(\bar X\) and \(\bar Y\) are independent, so \[\bar X - \bar Y \sim N\left(\mu_1 - \mu_2, \frac{\sigma^2}{m}+ \frac{\sigma^2}{n}\right).\] Under \(H_0\), \(\bar X - \bar Y \sim N(\delta_0, \sigma^2/m+ \sigma^2/n),\) or \[\begin{equation} \frac{\bar X - \bar Y - \delta_0}{\sqrt{\sigma^2/m+ \sigma^2/n}} \sim N(0, 1), \tag{11.1} \end{equation}\]

but we cannot use this as a test statistic, because it depends on the unknown \(\sigma^2\).

We estimate \(\sigma^2\) based on all the samples combined, by \[S_c^2 = \frac{\sum_{i=1}^{m} (X_i - \bar X)^2 + \sum_{i=1}^{n}(Y_i - \bar Y)^2}{m + n - 2}.\] We choose the denominator \(m + n - 2\) to make this an unbiased estimator of \(\sigma^2\): since \(\sum_{i=1}^{m} (X_i - \bar X)^2 \sim \sigma^2 \chi^2_{m - 1}\) and \(\sum_{i=1}^{n} (Y_i - \bar Y)^2 \sim \sigma^2 \chi^2_{n - 1}\), and these two quantities are independent, their sum has \(\sigma^2 \chi^2_{m + n - 2}\) distribution. So \[\frac{(m + n - 2)S_c^2}{\sigma^2} \sim \chi^2_{m + n - 2}.\]

Replacing \(\sigma^2\) by \(S_c^2\) in Equation (11.1), we get \[T = \frac{\bar X - \bar Y - \delta_0}{\sqrt{S_c^2 / m+ S_c^2 / n}} \sim t_{m + n - 2}\] under \(H_0\). For our observed data the observed value of \(T\) is \[t = \frac{\bar x - \bar y - \delta_0}{\sqrt{s_c^2 / m+ s_c^2 / n}}.\]

To construct a hypothesis test at significance level \(\alpha = 0.05\) we should only reject \(H_0\) if \(|t| > t_{m+n-2, 0.975}\). So if \(t > t_{m+n-2, 0.975}\) or \(t < -t_{m+n-2, 0.975}\), we reject \(H_0\). Otherwise, we do not reject \(H_0\).

Example 11.8 (\(F\)-test for equality of variances) In the classical two-sample \(t\)-test (Example 11.7), an assumption is made that the (population) variances of the two groups are equal. We might want to test whether this assumption appears to be reasonable, given the data. To do this, we now assume that each \(X_i \sim N(\mu_1, \sigma_1^2)\) and each \(Y_i \sim N(\mu_2, \sigma_2^2)\), and test the null hypothesis \(H_0: \sigma_1^2 = \sigma_2^2\) against the alternative \(H_1: \sigma_1^2 \not = \sigma_2^2\).

We estimate \(\sigma_1^2\) by \[S_1^2 = \frac{1}{m - 1} \sum_{i=1}^{m} (X_i - \bar X)^2\] and \(\sigma_2^2\) by \[S_2^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (Y_i - \bar Y)^2.\] Let \[F = \frac{S_1^2}{S_2^2}\] Under \(H_0\), by Proposition 9.6, we know that \(F \sim F_{m-1, n-1}.\) For our observed data the observed value of \(F\) is \[f = \frac{s_1^2}{s_2^2}.\] To construct a hypothesis test at significance level \(\alpha = 0.05\) we should only reject \(H_0\) if \(f < f_{m-1, n-1, 0.025}\) or \(f > f_{m-1, n-1, 0.975}\), where \(f_{m-1, n-1, p}\) is the \(p\)-quantile of an \(F_{m-1, n-1}\) distribution. Otherwise, we do not reject \(H_0\).