Chapter 6 The gamma distribution
6.1 The gamma distribution
In this chapter we will introduce a family of distributions known as the gamma distribution. We will see that this family contains several sub-families of distributions that arise in practical problems, including the exponential, chi-squared and Erlang distributions. The gamma distribution is also related to normal, \(t\) and \(F\) distributions.
In practice we are often interested in modelling data which is positive, such as the time to the occurrence of an event of interest. We have seen that an Exponential distribution is one possibility to model such data, but it lacks flexibility — it only has a single parameter and we have seen that all exponential pdfs have the same shape (see Example 2.6). It is natural to try to generalise the exponential to a more flexible family of distributions, and the gamma distribution is one way of achieving this.
Before we introduce the gamma distribution, we first need to introduce a function which is used in the pdf of the gamma distribution, to make sure that the pdf integrates to one.
When \(\alpha = 1\), we have \[\Gamma(1) = \int_0^\infty e^{-x} dx = 1.\]
Next, let’s prove a useful property about the Gamma function.As a consequence of Proposition 6.1 and the fact that \(\Gamma(1) = 1\), if \(\alpha\) is any positive integer then \[\Gamma(\alpha) = (\alpha - 1)!,\] so the Gamma function can be thought of as a continuous version of the factorial function.
It is also useful to know what happens to the Gamma function at half integer points, \(\Gamma(1/2)\), \(\Gamma(3/2)\) and so on. It can be shown that \[\Gamma(1/2) = \sqrt{\pi},\] and this may be used along with the recurrence relationship (Proposition 6.1) to compute any other \(\Gamma(n/2)\).
Note that the exponential distribution with rate parameter \(\beta\) is a special case of the gamma distribution, with \(\alpha = 1\).
Example 6.1 (Time until the \(k\)th event in a Poisson process) Suppose we have a Poisson process with events arriving at rate \(\beta\) per unit time, so that the number of events occurring in a time interval of length \(t\) has \(\text{Poisson}(\beta t)\) distribution. Let \(Y\) be the time until the \(k\)th event occurs. We claim that \(Y \sim \text{gamma}(k, \beta)\).
Let \(N_y\) be the number of events in the time interval \((0, y]\). Then \(N_y \sim \text{Poisson}(\beta y)\). We have \[P(Y > y) = P(N_y \leq k - 1) = \sum_{n=0}^{k-1} \frac{(\beta y)^n e^{-\beta y}}{n!},\] so \[\begin{align*} F(y) = P(Y \leq y) &= 1- P(Y > y) \\ &= 1 - \sum_{n=0}^{k-1} \frac{(\beta y)^n e^{-\beta y}}{n!} \\ &= 1 - e^{-\beta y} - \sum_{n=1}^{k-1} \frac{(\beta y)^n e^{-\beta y}}{n!}. \end{align*}\] So \[\begin{align*} f(y) &= \frac{dF}{dy} = \beta e^{-\beta y} - \sum_{n=1}^{k-1} \left\{ \frac{\beta^n y^{n-1} e^{-\beta y}}{(n-1)!} - \beta \frac{(\beta y)^n}{n!} e^{-\beta y} \right\} \\ &= \beta e^{-\beta y} - \beta e^{-\beta y} + \beta^2 y e^{-\beta y} - \beta^2 y e^{-\beta y} + \frac{\beta^3 y^2}{2!} e^{-\beta y} - \ldots + \frac{\beta (\beta y)^{k-1}}{(k-1)!} e^{-\beta y} \\ &= \frac{\beta^k}{\Gamma(k)} y^{k-1} e^{-\beta y}, \end{align*}\]which is the pdf of a \(\text{gamma}(k, \beta)\) random variable.
If \(\alpha\) is an integer, as in this example, the \(\text{gamma}(\alpha, \beta)\) distribution is also known as the Erlang distribution. The exponential distribution (with \(\alpha= 1\)) is a special case of the Erlang distribution.6.2 Properties of the gamma distribution
The following proposition tells us that the gamma distribution is always positively skewed, but the skewness decreases as \(\alpha\) increases. The gamma distribution always has larger kurtosis than the normal distribution, but the kurtosis decreases towards the kurtosis of a normal distribution (\(\gamma_2 = 3\)) as \(\alpha\) increases.
Proof. From Proposition 6.2, the mgf is \(M_Y(t) = (1 - t/\beta)^{-\alpha}\), so the cgf is \(K_Y(t) = \log M_Y(t) = -\alpha \log(1 - t/\beta)\). Differentiating, we have \[K_Y^{(1)}(t) = \frac{\alpha}{\beta\left(1 - \frac{t}{\beta}\right)} = \frac{\alpha}{\beta - t},\] so \(E(Y) = K_Y^{(1)}(0) = \alpha \beta^{-1}\).
Differentiating the cgf again, \[K_Y^{(2)}(t) = \frac{d}{dt} \left[\alpha (\beta - t)^{-1}\right] = \alpha(\beta - t)^{-2},\] so \(\mu_2 = \text{Var}(Y) = K_Y^{(2)}(0) = \alpha \beta^{-2}\).
Differentiating the cgf again, \[K_Y^{(3)}(t) = \frac{d}{dt} \left[\alpha (\beta - t)^{-2}\right] = 2 \alpha (\beta - t)^{-3},\] so the third moment about the mean is \(\mu_3 = K_Y^{(3)}(0) = 2 \alpha \beta^{-3}\). So the skewness is \[\gamma_1 = \frac{\mu_3}{\mu_2^{3/2}} = \frac{2 \alpha \beta^{-3}}{[\alpha \beta^{-2}]^{3/2}} = 2 \alpha^{-1/2}.\]
Differentiating the cgf again, \[K_Y^{(4)}(t) = \frac{d}{dt} \left[2 \alpha (\beta - t)^{-3}\right] = 6 \alpha (\beta - t)^{-4},\] so the fourth cumulant is \(\kappa_4 = 6 \alpha \beta^{-4}\). The fourth moment about the mean is \[\mu_4 = \kappa_4 + 3 \mu_2^2 = 6 \alpha \beta^{-4} + 3 \alpha^2 \beta^{-4},\] so the kurtosis is \[\gamma_2 = \frac{\mu_4}{\mu_2^2} = 3 + \frac{6 \alpha \beta^{-4}}{\alpha^2 \beta^{-4}} = 3 + 6 \alpha^{-1},\] as claimed.The gamma distribution is closed under scaling, and under summation of independent gamma distributions with a common rate parameter.
6.3 The chi-squared distribution
The chi-squared distribution is a useful special case of the gamma distribution.
We can write down the properties of the chi-squared distribution by using results we have already proved about the gamma distribution.
If \(Y \sim \chi^2_n\), we have \[E(Y) = \frac{n/2}{1/2} = n; \qquad \text{Var}(Y) = \frac{n/2}{(1/2)^2} = 2n.\]
By Proposition 6.5, if \(Y_1, \ldots, Y_n\) are independent random variables, with \(Y_i \sim \chi^2_{n_i}\) then \[\begin{equation} \sum_{i=1}^n Y_i \sim \chi^2_{\sum_{i=1}^n n_i}. \tag{6.2} \end{equation}\]The sum of independent chi-squared random variables has chi-squared distribution with degrees of freedom given by the sum of the individual degrees of freedom.
The chi-squared distribution is also related to the normal distribution.6.4 Distribution of the sample variance
Suppose \(Y_1, Y_2, \ldots, Y_n\) are independent identically distributed \(N(\mu, \sigma^2)\) random variables. We can use observations of \(y_1, y_2, \ldots, y_n\) of \(Y_1, \ldots, Y_n\) to estimate \(\mu\) and \(\sigma^2\). We can use the sample mean \[\bar y = \frac{1}{n} \sum_{i=1}^n y_i\] to estimate \(\mu\), and the sample variance \[s^2 = \frac{\sum_{i=1}^n (y_i - \bar y)^2}{n-1}\] to estimate \(\sigma^2\).
For any specific sample of observations, \(\bar y\) and \(s^2\) will take specific numerical values, but when they are defined in terms of the random variables \(Y_1 , Y_2 , \ldots, Y_n\), they too are random variables and will have distributions.
We have already seen (in Proposition 4.1) that \(\bar Y \sim N(\mu, \sigma^2/n)\). We would like to also know the distribution of the random version of the sample variance \[\begin{equation} S^2 = \frac{\sum_{i=1}^n (Y_i - \bar Y)^2}{n-1}. \tag{6.3} \end{equation}\]In order to prove this important result, we will need to use Cochran’s theorem, which we state here without proof.
Theorem 6.2 (Cochran’s Theorem) Suppose that \(U \sim \text{gamma}(\alpha_1, \beta)\) and \(W \sim \text{gamma}(\alpha_1 + \alpha_2, \beta)\). If \(V = W - U\), then any one of the following implies the other two:
- \(V \sim \text{gamma}(\alpha_2, \beta)\).
- \(V\) is independent of \(U\)
- \(V\) is non-negative.
We use Cochran’s theorem to prove Theorem 6.1:
as \(\sum_{i=1}^n (Y_i - \bar Y) = 0\). So \[\sum_{i=1}^n (Y_i - \mu)^2 = \sum_{i=1}^n (Y_i - \bar Y)^2 - n(\bar Y - \mu)^2.\] Multiplying through by \(\frac{1}{\sigma^2}\) gives \[\frac{1}{\sigma^2}\sum_{i=1}^n (Y_i - \mu)^2 = \frac{1}{\sigma^2}\sum_{i=1}^n (Y_i - \bar Y)^2 - \frac{n}{\sigma^2}(\bar Y - \mu)^2,\] so \[\frac{(n-1)S^2}{\sigma^2} = \sum_{i=1}^n \left(\frac{Y_i - \mu}{\sigma}\right)^2 - \left(\frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \right)^2.\] But \((Y_i - \mu)/\sigma \sim N(0, 1)\), so \[\sum_{i=1}^n \left(\frac{Y_i - \mu}{\sigma}\right)^2 \sim \chi^2_n\] by Proposition 6.7. We have \(\sqrt{n}(\bar Y - \mu)/\sigma \sim N(0, 1)\) by Proposition 4.1, so \[\left(\frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \right)^2 \sim \chi^2_1\] by Proposition 6.6.
Now we use Cochran’s Theorem (6.2), with \[\begin{align*} V &= \frac{(n-1) S^2}{\sigma^2} \\ W &= \sum_{i=1}^n \left(\frac{Y_i - \mu}{\sigma}\right)^2 \sim \chi^2_n \equiv \text{gamma}(n/2, 1/2) \\ U &= \left(\frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \right)^2 \sim \chi^2_1 \equiv \text{gamma}(1/2, 1/2), \end{align*}\]so \(\alpha_1 = 1/2\) and \(\alpha_1 + \alpha_2 = n/2\), so \(\alpha_2 = (n-1)/2\). We have \(\sum_{i=1}^n (Y_i - \bar Y)^2 \geq 0\), so \(V \geq 0\), so condition (iii) in Cochran’s Theorem is met.
So by (ii) of Cochran’s Theorem, \(V = \frac{(n-1) S^2}{\sigma^2}\) is independent of \(U = \left(\frac{\sqrt{n}(\bar Y - \mu)}{\sigma} \right)^2\), and hence \(S^2\) is independent of \(\bar Y\), as claimed.
By (i) of Cochran’s Theorem, \[V = \frac{(n-1) S^2}{\sigma^2} \sim \text{gamma}(\alpha_2, 1/2) \equiv \text{gamma}((n-1)/2, 1/2) \equiv \chi^2_{n-1},\] as claimed.where we have used that the expected value of a chi-squared random variable is its degrees of freedom. So \(S^2\) is an unbiased estimator of \(\sigma^2\): a concept which we will revisit in Chapter 10.