\( \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \)

Chapter 6 The gamma distribution

6.1 The gamma distribution

In this chapter we will introduce a family of distributions known as the gamma distribution. We will see that this family contains several sub-families of distributions that arise in practical problems, including the exponential, chi-squared and Erlang distributions. The gamma distribution is also related to normal, \(t\) and \(F\) distributions.

In practice we are often interested in modelling data which is positive, such as the time to the occurrence of an event of interest. We have seen that an Exponential distribution is one possibility to model such data, but it lacks flexibility — it only has a single parameter and we have seen that all exponential pdfs have the same shape (see Example 2.6). It is natural to try to generalise the exponential to a more flexible family of distributions, and the gamma distribution is one way of achieving this.

Before we introduce the gamma distribution, we first need to introduce a function which is used in the pdf of the gamma distribution, to make sure that the pdf integrates to one.

Definition 6.1 (Gamma function) The Gamma function is defined by \[\Gamma(\alpha) = \int_0^\infty x^{\alpha - 1} e^{-x} dx.\]

When \(\alpha = 1\), we have \[\Gamma(1) = \int_0^\infty e^{-x} dx = 1.\]

Next, let’s prove a useful property about the Gamma function.
Proposition 6.1 For any \(\alpha > 1\), \(\Gamma(\alpha) = (\alpha - 1) \Gamma(\alpha - 1).\)
Proof. We use integration by parts. \[\begin{align*} \Gamma(\alpha) &= \int_0^\infty x^{\alpha - 1} e^{-x} dx \\ &= \left[-x^{\alpha - 1} e^{-x}\right]_0^\infty + (\alpha - 1) \int_0^\infty x^{\alpha - 1} e^{-x}dx \\ &= 0 + (\alpha - 1) \int_0^\infty x^{(\alpha - 1) - 1} e^{-x} dx \\ &= (\alpha - 1) \Gamma(\alpha - 1) \end{align*}\] since \(\alpha - 1 > 0\) as \(\alpha > 1\).

As a consequence of Proposition 6.1 and the fact that \(\Gamma(1) = 1\), if \(\alpha\) is any positive integer then \[\Gamma(\alpha) = (\alpha - 1)!,\] so the Gamma function can be thought of as a continuous version of the factorial function.

It is also useful to know what happens to the Gamma function at half integer points, \(\Gamma(1/2)\), \(\Gamma(3/2)\) and so on. It can be shown that \[\Gamma(1/2) = \sqrt{\pi},\] and this may be used along with the recurrence relationship (Proposition 6.1) to compute any other \(\Gamma(n/2)\).

Definition 6.2 We say a random variable \(Y\) has gamma distribution with shape parameter \(\alpha > 0\) and rate parameter \(\beta > 0\) if it has pdf \[\begin{equation} f(y) = \frac{\beta^\alpha}{\Gamma(\alpha)} y^{\alpha - 1} e^{-\beta y}, \quad y > 0. \tag{6.1} \end{equation}\] We write this as \(Y \sim \text{gamma}(\alpha, \beta)\).

Note that the exponential distribution with rate parameter \(\beta\) is a special case of the gamma distribution, with \(\alpha = 1\).

Example 6.1 (Time until the \(k\)th event in a Poisson process) Suppose we have a Poisson process with events arriving at rate \(\beta\) per unit time, so that the number of events occurring in a time interval of length \(t\) has \(\text{Poisson}(\beta t)\) distribution. Let \(Y\) be the time until the \(k\)th event occurs. We claim that \(Y \sim \text{gamma}(k, \beta)\).

Let \(N_y\) be the number of events in the time interval \((0, y]\). Then \(N_y \sim \text{Poisson}(\beta y)\). We have \[P(Y > y) = P(N_y \leq k - 1) = \sum_{n=0}^{k-1} \frac{(\beta y)^n e^{-\beta y}}{n!},\] so \[\begin{align*} F(y) = P(Y \leq y) &= 1- P(Y > y) \\ &= 1 - \sum_{n=0}^{k-1} \frac{(\beta y)^n e^{-\beta y}}{n!} \\ &= 1 - e^{-\beta y} - \sum_{n=1}^{k-1} \frac{(\beta y)^n e^{-\beta y}}{n!}. \end{align*}\] So \[\begin{align*} f(y) &= \frac{dF}{dy} = \beta e^{-\beta y} - \sum_{n=1}^{k-1} \left\{ \frac{\beta^n y^{n-1} e^{-\beta y}}{(n-1)!} - \beta \frac{(\beta y)^n}{n!} e^{-\beta y} \right\} \\ &= \beta e^{-\beta y} - \beta e^{-\beta y} + \beta^2 y e^{-\beta y} - \beta^2 y e^{-\beta y} + \frac{\beta^3 y^2}{2!} e^{-\beta y} - \ldots + \frac{\beta (\beta y)^{k-1}}{(k-1)!} e^{-\beta y} \\ &= \frac{\beta^k}{\Gamma(k)} y^{k-1} e^{-\beta y}, \end{align*}\]

which is the pdf of a \(\text{gamma}(k, \beta)\) random variable.

If \(\alpha\) is an integer, as in this example, the \(\text{gamma}(\alpha, \beta)\) distribution is also known as the Erlang distribution. The exponential distribution (with \(\alpha= 1\)) is a special case of the Erlang distribution.

6.2 Properties of the gamma distribution

Proposition 6.2 The moment generating function of the \(\text{gamma}(\alpha, \beta)\) distribution is \[M_Y(t) = \left(1 - \frac{t}{\beta} \right)^{-\alpha}, \quad \text{for $t < \beta$}\]
Proof. We have \[\begin{align*} M_Y(t) = E(e^{tY}) &= \int_0^\infty e^{ty} \frac{\beta^\alpha}{\Gamma(\alpha)} y^{\alpha - 1} e^{-\beta y} dy \\ &= \int_0^\infty \frac{\beta^\alpha}{\Gamma(\alpha)} y^{\alpha - 1} e^{-y(\beta - t)} dy \\ &= \frac{\beta^\alpha}{(\beta - t)^\alpha} \int_0^\infty \frac{(\beta - t)^\alpha}{\Gamma(\alpha)} y^{\alpha - 1} e^{-y(\beta - t)} dy \\ &= \left(1 - \frac{t}{\beta}\right)^{-\alpha} \times 1, \end{align*}\] as the final integrand is the pdf of a \(\text{gamma}(\alpha, \beta - t)\) distribution, which integrates to \(1\) provided that the new rate parameter \(\beta - t > 0\), i.e. if \(t < \beta\).

The following proposition tells us that the gamma distribution is always positively skewed, but the skewness decreases as \(\alpha\) increases. The gamma distribution always has larger kurtosis than the normal distribution, but the kurtosis decreases towards the kurtosis of a normal distribution (\(\gamma_2 = 3\)) as \(\alpha\) increases.

Proposition 6.3 Suppose \(Y \sim \text{gamma}(\alpha, \beta)\). Then \(E(Y) = \alpha \beta^{-1}\) and \(\text{Var}(Y) = \alpha \beta^{-2}\). The skewness is \(\gamma_1 = 2 \alpha^{-1/2}\) and the kurtosis is \(\gamma_2 = 3 + 6 \alpha^{-1}\).

Proof. From Proposition 6.2, the mgf is \(M_Y(t) = (1 - t/\beta)^{-\alpha}\), so the cgf is \(K_Y(t) = \log M_Y(t) = -\alpha \log(1 - t/\beta)\). Differentiating, we have \[K_Y^{(1)}(t) = \frac{\alpha}{\beta\left(1 - \frac{t}{\beta}\right)} = \frac{\alpha}{\beta - t},\] so \(E(Y) = K_Y^{(1)}(0) = \alpha \beta^{-1}\).

Differentiating the cgf again, \[K_Y^{(2)}(t) = \frac{d}{dt} \left[\alpha (\beta - t)^{-1}\right] = \alpha(\beta - t)^{-2},\] so \(\mu_2 = \text{Var}(Y) = K_Y^{(2)}(0) = \alpha \beta^{-2}\).

Differentiating the cgf again, \[K_Y^{(3)}(t) = \frac{d}{dt} \left[\alpha (\beta - t)^{-2}\right] = 2 \alpha (\beta - t)^{-3},\] so the third moment about the mean is \(\mu_3 = K_Y^{(3)}(0) = 2 \alpha \beta^{-3}\). So the skewness is \[\gamma_1 = \frac{\mu_3}{\mu_2^{3/2}} = \frac{2 \alpha \beta^{-3}}{[\alpha \beta^{-2}]^{3/2}} = 2 \alpha^{-1/2}.\]

Differentiating the cgf again, \[K_Y^{(4)}(t) = \frac{d}{dt} \left[2 \alpha (\beta - t)^{-3}\right] = 6 \alpha (\beta - t)^{-4},\] so the fourth cumulant is \(\kappa_4 = 6 \alpha \beta^{-4}\). The fourth moment about the mean is \[\mu_4 = \kappa_4 + 3 \mu_2^2 = 6 \alpha \beta^{-4} + 3 \alpha^2 \beta^{-4},\] so the kurtosis is \[\gamma_2 = \frac{\mu_4}{\mu_2^2} = 3 + \frac{6 \alpha \beta^{-4}}{\alpha^2 \beta^{-4}} = 3 + 6 \alpha^{-1},\] as claimed.

The gamma distribution is closed under scaling, and under summation of independent gamma distributions with a common rate parameter.

Proposition 6.4 Suppose \(Y \sim \text{gamma}(\alpha, \beta)\), and let \(Z = bY\), for some constant \(b > 0\). Then \(Z \sim \text{gamma}(\alpha, \beta / b)\).
Proof. By Theorem 3.2, we have \[M_Z(t) = M_Y(bt) = \left( 1 - \frac{bt}{\beta} \right)^{-\alpha} = \left(1 - \frac{t}{\beta/b}\right)^{-\alpha},\] which is the mgf of a \(\text{gamma}(\alpha, \beta / b)\) distribution. Since the mgf characterises the distribution, we have \(Z \sim \text{gamma}(\alpha, \beta / b)\).
Proposition 6.5 If \(Y_1, \ldots, Y_n\) are independent, with \(Y_i \sim \text{gamma}(\alpha_i, \beta)\) then \(Z = \sum_{i=1}^n Y_i \sim \text{gamma}(\sum_{i=1}^n \alpha_i, \beta)\).
Proof. By Theorem 4.1, we have \[\begin{align*} M_Z(t) &= \prod_{i=1}^n M_{Y_i}(t) \\ &= \prod_{i=1}^n \left( 1 - \frac{t}{\beta} \right)^{-\alpha_i} &= \left( 1 - \frac{t}{\beta} \right)^{-\sum_{i=1}^n \alpha_i}, \end{align*}\] which is the mgf of a \(\text{gamma}(\sum_{i=1}^n \alpha_i, \beta)\) distribution. Since the mgf characterises the distribution, we have \(Z \sim \text{gamma}(\sum_{i=1}^n \alpha_i, \beta)\).

6.3 The chi-squared distribution

The chi-squared distribution is a useful special case of the gamma distribution.

Definition 6.3 We say a random variable \(Y\) has chi-squared distribution with \(n\) degrees of freedom if \(Y \sim \text{gamma}(n/2, 1/2)\). We write \(Y \sim \chi^2_n\).

We can write down the properties of the chi-squared distribution by using results we have already proved about the gamma distribution.

If \(Y \sim \chi^2_n\), we have \[E(Y) = \frac{n/2}{1/2} = n; \qquad \text{Var}(Y) = \frac{n/2}{(1/2)^2} = 2n.\]

By Proposition 6.5, if \(Y_1, \ldots, Y_n\) are independent random variables, with \(Y_i \sim \chi^2_{n_i}\) then \[\begin{equation} \sum_{i=1}^n Y_i \sim \chi^2_{\sum_{i=1}^n n_i}. \tag{6.2} \end{equation}\]

The sum of independent chi-squared random variables has chi-squared distribution with degrees of freedom given by the sum of the individual degrees of freedom.

The chi-squared distribution is also related to the normal distribution.
Proposition 6.6 Let \(Z \sim N(0, 1)\), and let \(Y = Z^2\). Then \(Y \sim \chi^2_1\).
Proof. Write \(\Phi(.)\) for the cumulative distribution function of \(Z\), and \(\phi(.)\) for the pdf of \(Z\), so that \[\phi(z) = \frac{1}{\sqrt{2 \pi}} \exp\left(-\frac{z^2}{2}\right).\] Let \(Y\) have cdf \(F(.)\) and pdf \(f(.)\). Then, for \(y > 0\), \[\begin{align*} F(y) &= P(Y \leq y) = P(Z^2 \leq y) \\ &= P\left(-\sqrt{y} \leq Z \leq \sqrt{y} \right) \\ &= \Phi\left(\sqrt{y}\right) - \Phi\left(-\sqrt{y}\right). \end{align*}\] So \[\begin{align*} f(y) &= \frac{d}{dy} F(y) = \frac{1}{2 \sqrt{y}} \phi \left(\sqrt{y}\right) + \frac{1}{2 \sqrt{y}} \phi \left(-\sqrt{y}\right) \\ &= \frac{1}{2 \sqrt{y}} \left\{\frac{1}{\sqrt{2 \pi}} \exp\left(-\frac{y}{2}\right) + \frac{1}{\sqrt{2 \pi}} \exp\left(-\frac{y}{2}\right) \right\} \\ &= \frac{1}{\sqrt{2}\sqrt{y}\sqrt{\pi}} \exp\left(-\frac{y}{2}\right) \\ &= \frac{y^{\frac{1}{2} - 1}}{\Gamma(1/2)} \left(\frac{1}{2}\right)^{1/2} \exp\left(-\frac{y}{2}\right), \end{align*}\] since \(\Gamma(1/2) = \sqrt{\pi}\). This is the pdf of a \(\text{gamma}(1/2, 1/2) \equiv \chi^2_1\) distribution, as claimed.
Putting together Proposition 6.6 and Equation (6.2) gives us a way to construct a random variable with any chi-squared distribution, given a supply of independent and identically distributed standard normal random variables:
Proposition 6.7 Let \(Z_1, Z_2, \ldots Z_n\) be independent and identically distributed, with \(Z_i \sim N(0, 1)\), and let \(Y = \sum_{i=1}^n Z_i^2\). Then \(Y \sim \chi^2_n\).
Proof. By Proposition 6.6, each \(Z_i^2 \sim \chi^2_1\), and \(Z_1^2, \ldots, Z_n^2\) are independent. So by Equation (6.2) with \(n_i = 1\), \(Y \sim \chi^2_{\sum_{i=1}^n 1} \equiv \chi^2_n\), as claimed.

6.4 Distribution of the sample variance

Suppose \(Y_1, Y_2, \ldots, Y_n\) are independent identically distributed \(N(\mu, \sigma^2)\) random variables. We can use observations of \(y_1, y_2, \ldots, y_n\) of \(Y_1, \ldots, Y_n\) to estimate \(\mu\) and \(\sigma^2\). We can use the sample mean \[\bar y = \frac{1}{n} \sum_{i=1}^n y_i\] to estimate \(\mu\), and the sample variance \[s^2 = \frac{\sum_{i=1}^n (y_i - \bar y)^2}{n-1}\] to estimate \(\sigma^2\).

For any specific sample of observations, \(\bar y\) and \(s^2\) will take specific numerical values, but when they are defined in terms of the random variables \(Y_1 , Y_2 , \ldots, Y_n\), they too are random variables and will have distributions.

We have already seen (in Proposition 4.1) that \(\bar Y \sim N(\mu, \sigma^2/n)\). We would like to also know the distribution of the random version of the sample variance \[\begin{equation} S^2 = \frac{\sum_{i=1}^n (Y_i - \bar Y)^2}{n-1}. \tag{6.3} \end{equation}\]
Theorem 6.1 \(Y_1, Y_2, \ldots, Y_n\) are independent identically distributed \(N(\mu, \sigma^2)\) random variables, and let \(S^2\) be as defined in Equation (6.3). Then \[\frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1},\] and \(S^2\) and \(\bar Y\) are independent.

In order to prove this important result, we will need to use Cochran’s theorem, which we state here without proof.

Theorem 6.2 (Cochran’s Theorem) Suppose that \(U \sim \text{gamma}(\alpha_1, \beta)\) and \(W \sim \text{gamma}(\alpha_1 + \alpha_2, \beta)\). If \(V = W - U\), then any one of the following implies the other two:

  1. \(V \sim \text{gamma}(\alpha_2, \beta)\).
  2. \(V\) is independent of \(U\)
  3. \(V\) is non-negative.

We use Cochran’s theorem to prove Theorem 6.1:

Proof (Theorem 6.1). The key to the proof is to write \(\sum_{i=1}^n (Y_i - \bar Y)^2\) in terms of \(\sum_{i=1}^n (Y_i - \mu)^2\). We have \[\begin{align*} \sum_{i=1}^n (Y_i - \mu)^2 &= \sum_{i=1}^n (Y_i - \bar Y + \bar Y - \mu)^2 \\ &= \sum_{i=1}^n \left\{(Y_i - \bar Y)^2 + 2(Y_i - \bar Y)(\bar Y - \mu) + (\bar Y - \mu)^2 \right\} \\ &= \sum_{i=1}^n (Y_i - \bar Y)^2 + 2(\bar Y - \mu) \sum_{i=1}^n (Y_i - \bar Y) + n(\bar Y - \mu)^2 \\ &= \sum_{i=1}^n (Y_i - \bar Y)^2 + n(\bar Y - \mu)^2, \end{align*}\]

as \(\sum_{i=1}^n (Y_i - \bar Y) = 0\). So \[\sum_{i=1}^n (Y_i - \mu)^2 = \sum_{i=1}^n (Y_i - \bar Y)^2 - n(\bar Y - \mu)^2.\] Multiplying through by \(\frac{1}{\sigma^2}\) gives \[\frac{1}{\sigma^2}\sum_{i=1}^n (Y_i - \mu)^2 = \frac{1}{\sigma^2}\sum_{i=1}^n (Y_i - \bar Y)^2 - \frac{n}{\sigma^2}(\bar Y - \mu)^2,\] so \[\frac{(n-1)S^2}{\sigma^2} = \sum_{i=1}^n \left(\frac{Y_i - \mu}{\sigma}\right)^2 - \left(\frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \right)^2.\] But \((Y_i - \mu)/\sigma \sim N(0, 1)\), so \[\sum_{i=1}^n \left(\frac{Y_i - \mu}{\sigma}\right)^2 \sim \chi^2_n\] by Proposition 6.7. We have \(\sqrt{n}(\bar Y - \mu)/\sigma \sim N(0, 1)\) by Proposition 4.1, so \[\left(\frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \right)^2 \sim \chi^2_1\] by Proposition 6.6.

Now we use Cochran’s Theorem (6.2), with \[\begin{align*} V &= \frac{(n-1) S^2}{\sigma^2} \\ W &= \sum_{i=1}^n \left(\frac{Y_i - \mu}{\sigma}\right)^2 \sim \chi^2_n \equiv \text{gamma}(n/2, 1/2) \\ U &= \left(\frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \right)^2 \sim \chi^2_1 \equiv \text{gamma}(1/2, 1/2), \end{align*}\]

so \(\alpha_1 = 1/2\) and \(\alpha_1 + \alpha_2 = n/2\), so \(\alpha_2 = (n-1)/2\). We have \(\sum_{i=1}^n (Y_i - \bar Y)^2 \geq 0\), so \(V \geq 0\), so condition (iii) in Cochran’s Theorem is met.

So by (ii) of Cochran’s Theorem, \(V = \frac{(n-1) S^2}{\sigma^2}\) is independent of \(U = \left(\frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \right)^2\), and hence \(S^2\) is independent of \(\bar Y\), as claimed.

By (i) of Cochran’s Theorem, \[V = \frac{(n-1) S^2}{\sigma^2} \sim \text{gamma}(\alpha_2, 1/2) \equiv \text{gamma}((n-1)/2, 1/2) \equiv \chi^2_{n-1},\] as claimed.
A consequence of Theorem 6.1 is that \[\begin{align*} E(S^2) &= E\left(\frac{\sigma^2}{n-1} \frac{(n-1) S^2}{\sigma^2} \right) \\ &= \frac{\sigma^2}{n-1} E\left(\frac{(n-1) S^2}{\sigma^2} \right) \\ &= \frac{\sigma^2}{n-1} \cdot (n - 1) = \sigma^2, \end{align*}\]

where we have used that the expected value of a chi-squared random variable is its degrees of freedom. So \(S^2\) is an unbiased estimator of \(\sigma^2\): a concept which we will revisit in Chapter 10.