MATH2011: Statistical Distribution Theory
2019/20
Chapter 1 Probability distributions
1.1 Random variables
A random variable \(Y\) is described by its domain (or sample space) \(D\) together with the probabilities assigned to subsets of the domain. These define the probability distribution of the random variable. We distinguish between discrete and continuous random variables.
Discrete probability distributions are defined by a probability (mass) function \[p(y)\equiv P(Y=y), \quad \text{for $y \in D$}\] where \[\sum_{y\in D} p(y) =1.\] The distribution function \(F(\cdot)\) is defined for all \(y \in \mathbb{R}\) by \[F(y)\equiv P(Y \leq y) = \sum_{x\in D: \, x\leq y} p(x).\]
Continuous probability distributions are defined by a probability density function (pdf) \(f(\cdot)\) where \[P(y_1< Y \leq y_2) = \int_{y_1}^{y_2} f(y) dy.\] The domain of \(Y\) is the set \(D = \{y \in \mathbb{R}: f(y) > 0\}\) Hence \[\int_{-\infty}^\infty f(y) dy = \int_D f(y) dy = 1.\] The distribution function \(F(\cdot)\) is then given by \[F(y)\equiv P(Y \leq y) = \int_{-\infty}^y f(x) dx.\] Therefore \[P(y_1< Y \leq y_2) = F(y_2) - F(y_1)\] and \[f(y) = \frac{d}{dy} F(y).\]
1.2 Examples of discrete distributions
1.2.1 The Bernoulli distribution
A Bernoulli trial is an experiment with just two possible outcomes ‘success’ and ‘failure’ which occur with probabilities \(\theta\) and \(1- \theta\) respectively, where \(\theta\) is the success probability. The indicator of success in a Bernoulli trial has Bernoulli distribution.
1.2.2 The binomial distribution
Suppose we undertake a fixed number, \(n\), of independent Bernoulli trials, each with success probability \(\theta\). Let \(Y\) be the number of successes in these \(n\) trials. Then \(Y\) has binomial distribution.
1.2.3 The negative binomial distribution
Suppose we undertake a sequence of independent Bernoulli trials, each with success probability \(\theta\). Let \(Y\) be the number of failures that occur before the \(k\)th success. Then \(Y\) has negative binomial distribution.The geometric distribution is the special case of the negative binomial distribution with \(k = 1\): the number of failures that occur before the first success.
1.2.4 The Poisson distribution
The Poisson distribution arises in a variety of practical situations where we are interested in modelling counts of how often an ‘event’ occurs.
In a Poisson process, events occur at random at constant rate \(\theta\) per unit time, independent of all other events. If we define \(Y\) as the number of events of a Poisson process in an interval of fixed length \(t\), then \(Y \sim \text{Poisson}(t \theta)\).
1.3 Examples of continuous distributions
1.3.1 The exponential distribution
In Section 1.2.4, we considered the Poisson process, in which events occur at random, at a rate \(\theta\) per unit time. The actual number of events which take place in any given unit of time has \(\text{Poisson}(\theta)\) distribution. The exponential distribution represents the time between consecutive events in this process.
Let \(Y\) represent the time interval between two events. Clearly this variable cannot be negative, but can take any positive value. The domain of \(Y\) is \((0, \infty)\). We have \[\begin{align*} P(Y > y) &= P(\text{no events in an interval of length $y$}) \\ &= \frac{e^{-\theta y} (\theta y)^0}{0!} \\ &= e^{-\theta y} \end{align*}\]so \[F(y) = P(Y \leq y) = 1 - e^{-\theta y}, \quad y > 0.\] Differentiating, we obtain the pdf \[f(y) = \frac{d}{dy} F(y) = \theta e^{-\theta y}, \quad y > 0.\]
Example 1.1 Suppose the lifetime in hours of a certain type of electronic component is described by an Exponential random variable with rate parameter \(\theta = 0.01\). What is the probability such a component will have a lifetime of between \(100\) and \(200\) hours?
The probability is the area under the curve \(f(y) = 0.01 e^{-0.01 y}\) between \(y = 100\) and \(y = 200\), so \[\begin{align*} P(100 < Y \leq 200) &= \int_{100}^{200} 0.01e^{-0.01 y} dy \\ &= e^{-1} - e^{-2} = 0.37 - 0.14 = 0.23. \end{align*}\] We could find this in R
with
pexp(200, rate = 0.01) - pexp(100, rate = 0.01)
## [1] 0.2325442
1.3.2 The uniform distribution
The uniform distribution is one of the simplest probability distributions: it just places a constant density between some range \((a, b)\):
1.3.3 The normal distribution
The normal distribution is probably the single most important distribution in statistics. The main reason for its importance is the central limit theorem, which you have seen before in MATH1024, and which we will prove in Section 4.4.
Example 1.2 Suppose that daily water use at a factory varies about a mean of \(77500\) litres with standard deviation \(5700\) litres. If demand is normally distributed
- What proportion of days does the demand fall short of \(70000\) litres?
- What proportion of days does demand exceed \(90000\) litres?
- What is your reaction to a demand of \(175000\) gallons?
R
to compute the probability \(P(X < 70000) = F(70000)\)
pnorm(70000, mean = 77500, sd = 5700)
## [1] 0.09412236
so the daily water use will be less than \(70000\) litres about \(9.4\%\) of the time.
We can find \(P(x > 90000) = 1 - F(90000)\)
1 - pnorm(90000, mean = 77500, sd = 5700)
## [1] 0.01415432
so the daily water use will be more than \(90000\) litres about \(1.4\%\) of the time.
We can find \(P(x > 175000) = 1 - F(175000)\)
1 - pnorm(175000, mean = 77500, sd = 5700)
## [1] 0
In fact, the number is not exactly zero, but it is so small that the computer is rounding it to zero. Such an extreme water use is therefore surprising, and an explanation should be sought. It is possible that a error has occurred in recording the water use, such as two days data being taken together. Alternatively, perhaps our model which assumes that \(X \sim N(77500, 5700^2)\) is incorrect. This idea of surprise at an extreme result of low probability, as predicted by a statistical model, will be important later in this module and also in modules such as MATH2010 Statistical Modelling I.