\( \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \)

Chapter 1 Probability distributions

1.1 Random variables

A random variable \(Y\) is described by its domain (or sample space) \(D\) together with the probabilities assigned to subsets of the domain. These define the probability distribution of the random variable. We distinguish between discrete and continuous random variables.

Discrete probability distributions are defined by a probability (mass) function \[p(y)\equiv P(Y=y), \quad \text{for $y \in D$}\] where \[\sum_{y\in D} p(y) =1.\] The distribution function \(F(\cdot)\) is defined for all \(y \in \mathbb{R}\) by \[F(y)\equiv P(Y \leq y) = \sum_{x\in D: \, x\leq y} p(x).\]

Continuous probability distributions are defined by a probability density function (pdf) \(f(\cdot)\) where \[P(y_1< Y \leq y_2) = \int_{y_1}^{y_2} f(y) dy.\] The domain of \(Y\) is the set \(D = \{y \in \mathbb{R}: f(y) > 0\}\) Hence \[\int_{-\infty}^\infty f(y) dy = \int_D f(y) dy = 1.\] The distribution function \(F(\cdot)\) is then given by \[F(y)\equiv P(Y \leq y) = \int_{-\infty}^y f(x) dx.\] Therefore \[P(y_1< Y \leq y_2) = F(y_2) - F(y_1)\] and \[f(y) = \frac{d}{dy} F(y).\]

1.2 Examples of discrete distributions

1.2.1 The Bernoulli distribution

A Bernoulli trial is an experiment with just two possible outcomes ‘success’ and ‘failure’ which occur with probabilities \(\theta\) and \(1- \theta\) respectively, where \(\theta\) is the success probability. The indicator of success in a Bernoulli trial has Bernoulli distribution.

Definition 1.1 A discrete random variables \(Y\) has Bernoulli distribution if it has probability function of the form \[p(y) = \theta^y (1- \theta)^{1-y}, \quad y = 0, 1,\] for some \(0 < \theta < 1\). We write \(Y \sim \text{Bernoulli}(\theta)\).

1.2.2 The binomial distribution

Suppose we undertake a fixed number, \(n\), of independent Bernoulli trials, each with success probability \(\theta\). Let \(Y\) be the number of successes in these \(n\) trials. Then \(Y\) has binomial distribution.

Definition 1.2 A discrete random variables \(Y\) has binomial distribution if it has probability function of the form \[p(y) = \binom{n}{y} \theta^y (1 - \theta)^{n-y}, \quad y = 0, 1, \ldots, n,\] for some \(n \in \mathbb{N}\) and \(0 < \theta < 1\). We write \(Y \sim \text{binomial}(n, \theta)\).

1.2.3 The negative binomial distribution

Suppose we undertake a sequence of independent Bernoulli trials, each with success probability \(\theta\). Let \(Y\) be the number of failures that occur before the \(k\)th success. Then \(Y\) has negative binomial distribution.
Definition 1.3 A discrete random variables \(Y\) has negative binomial distribution if it has probability function of the form \[p(y) = \binom{k + y - 1}{y} (1 - \theta)^y \theta^k, \quad y = 0, 1, \ldots, \] for some \(k \in \mathbb{N}\), and \(0 < \theta < 1\). We write \(Y \sim \text{negbin}(k, \theta)\).

The geometric distribution is the special case of the negative binomial distribution with \(k = 1\): the number of failures that occur before the first success.

Definition 1.4 A discrete random variables \(Y\) has geometric distribution if it has probability function of the form \[p(y) = (1 - \theta)^y \theta, \quad y = 0, 1, \ldots, \] for some \(0 < \theta < 1\). We write \(Y \sim \text{geometric}(\theta)\).

1.2.4 The Poisson distribution

The Poisson distribution arises in a variety of practical situations where we are interested in modelling counts of how often an ‘event’ occurs.

Definition 1.5 A discrete random variable \(Y\) has Poisson distribution if it has probability function of the form \[p(y) = \frac{e^{-\theta} (\theta)^y}{y!}, \quad y = 0, 1, \ldots,\] for some rate parameter \(\theta > 0\). We write \(Y \sim \text{Poisson}(\theta)\).

In a Poisson process, events occur at random at constant rate \(\theta\) per unit time, independent of all other events. If we define \(Y\) as the number of events of a Poisson process in an interval of fixed length \(t\), then \(Y \sim \text{Poisson}(t \theta)\).

1.3 Examples of continuous distributions

1.3.1 The exponential distribution

In Section 1.2.4, we considered the Poisson process, in which events occur at random, at a rate \(\theta\) per unit time. The actual number of events which take place in any given unit of time has \(\text{Poisson}(\theta)\) distribution. The exponential distribution represents the time between consecutive events in this process.

Let \(Y\) represent the time interval between two events. Clearly this variable cannot be negative, but can take any positive value. The domain of \(Y\) is \((0, \infty)\). We have \[\begin{align*} P(Y > y) &= P(\text{no events in an interval of length $y$}) \\ &= \frac{e^{-\theta y} (\theta y)^0}{0!} \\ &= e^{-\theta y} \end{align*}\]

so \[F(y) = P(Y \leq y) = 1 - e^{-\theta y}, \quad y > 0.\] Differentiating, we obtain the pdf \[f(y) = \frac{d}{dy} F(y) = \theta e^{-\theta y}, \quad y > 0.\]

Definition 1.6 A random variable \(Y\) has exponential distribution if it has pdf of the form \[f(y) = \theta e^{-\theta y}, \quad y > 0,\] for some rate parameter \(\theta > 0\). We write \(Y \sim \text{exponential}(\theta)\).

Example 1.1 Suppose the lifetime in hours of a certain type of electronic component is described by an Exponential random variable with rate parameter \(\theta = 0.01\). What is the probability such a component will have a lifetime of between \(100\) and \(200\) hours?

The probability is the area under the curve \(f(y) = 0.01 e^{-0.01 y}\) between \(y = 100\) and \(y = 200\), so \[\begin{align*} P(100 < Y \leq 200) &= \int_{100}^{200} 0.01e^{-0.01 y} dy \\ &= e^{-1} - e^{-2} = 0.37 - 0.14 = 0.23. \end{align*}\]

We could find this in R with

pexp(200, rate = 0.01) - pexp(100, rate = 0.01)
## [1] 0.2325442

1.3.2 The uniform distribution

The uniform distribution is one of the simplest probability distributions: it just places a constant density between some range \((a, b)\):

Definition 1.7 A random variable \(Y\) has uniform distribution if it has pdf of the form \[f(y) = \frac{1}{b-a}, a < y < b,\] for some parameters \(a, b \in \mathbb{R}\), with \(b > a\). We write \(Y \sim U(a, b)\).

1.3.3 The normal distribution

The normal distribution is probably the single most important distribution in statistics. The main reason for its importance is the central limit theorem, which you have seen before in MATH1024, and which we will prove in Section 4.4.

Definition 1.8 A random variable \(Y\) has normal distribution if it has pdf of the form \[f(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left\{-\frac{1}{2 \sigma^2} (y - \mu)^2 \right\},\] for some parameters \(\mu \in \mathbb{R}\) and \(\sigma^2 > 0\). We write \(Y \sim N(\mu, \sigma^2)\).

Example 1.2 Suppose that daily water use at a factory varies about a mean of \(77500\) litres with standard deviation \(5700\) litres. If demand is normally distributed

  1. What proportion of days does the demand fall short of \(70000\) litres?
  2. What proportion of days does demand exceed \(90000\) litres?
  3. What is your reaction to a demand of \(175000\) gallons?
Writing \(X\) for the daily water use in litres , we have \(X \sim N(77500, 5700^2)\) We can use R to compute the probability \(P(X < 70000) = F(70000)\)
pnorm(70000, mean = 77500, sd = 5700)
## [1] 0.09412236

so the daily water use will be less than \(70000\) litres about \(9.4\%\) of the time.

We can find \(P(x > 90000) = 1 - F(90000)\)

1 - pnorm(90000, mean = 77500, sd = 5700)
## [1] 0.01415432

so the daily water use will be more than \(90000\) litres about \(1.4\%\) of the time.

We can find \(P(x > 175000) = 1 - F(175000)\)

1 - pnorm(175000, mean = 77500, sd = 5700)
## [1] 0

In fact, the number is not exactly zero, but it is so small that the computer is rounding it to zero. Such an extreme water use is therefore surprising, and an explanation should be sought. It is possible that a error has occurred in recording the water use, such as two days data being taken together. Alternatively, perhaps our model which assumes that \(X \sim N(77500, 5700^2)\) is incorrect. This idea of surprise at an extreme result of low probability, as predicted by a statistical model, will be important later in this module and also in modules such as MATH2010 Statistical Modelling I.