Chapter 2 Introduction to Probability

2.1 Definitions of probability

We use probabilities to express how uncertain we are about whether certain events will happen. For example: (i) I believe there is more than an 60% chance that Manchester City will win the premiership title this season; (ii) the probability of a fair coin landing heads is 0.5.

As discussed in Chapter 1, uncertainty is everywhere, and probability is the language we use to measure it.

The football and coin examples show two different ways to interpret probability. The football example is based on the commentator’s personal belief, not on repeated experiments. This is called subjective probability.

Subjective probability measures how plausible someone thinks an event is, based on their experience and available evidence (for example, knowing the recent record of each football club in the premiership, and their investments in new players).

Other examples include: “I think there is a 70% chance the FTSE 100 will rise tomorrow,” or “The Met Office says there is a 40% chance of a white Christmas in Southampton this year.”

The second way to define probability is by looking at the long-term relative frequency of an event in a random experiment that can be repeated many times under similar conditions (like tossing a coin).

Suppose we repeat a random experiment many times under the same conditions and count how often event $A$ happens. The relative frequency, \[\frac{\text{number of times $A$ occurs}}{\text{total number of repetitions}},\] approaches a fixed value as the number of repetitions increases. This value is defined as $P\{A\}$.

For example, if we toss a coin 1000 times and count the number of heads, the proportion of heads is the relative frequency of event $A$ (getting a head) in those 1000 tosses.

Sometimes, we can use basic facts about probability to find $P\{A\}$ more easily. For example, if a coin is fair, then $P\{\text{head}\} = P\{\text{tail}\}$ and $P\{\text{head}\} + P\{\text{tail}\} = 1$, so both probabilities are 0.5.

These basic facts are formalised as the axioms of probability, which are the foundation of probability theory.

2.2 Some definitions

Before we can state and use the axioms of probability, we need to introduce some terminology.

A random experiment is an experiment where we cannot predict the exact outcome, but we know all the possible outcomes.

The sample space ($S$) is the set of all possible outcomes. For example, for a single coin toss, $S = \{\text{head}, \text{tail}\}$. For two coins, $S = \{\text{HH}, \text{HT}, \text{TH}, \text{TT}\}$, where H and T are head and tail.

An event is a specific result or set of results from the experiment. For example, getting ‘HH’ (two heads) is an event, and so is getting at least one head: $\{\text{HH}, \text{HT}, \text{TH}\}$.

We use capital letters like $A, B, C$ to represent events. A single outcome is called an elementary event (e.g., ‘HH’), while a set of outcomes is a composite event (e.g., at least one head).

Probability theory is about finding the probability $P\{A\}$ of a given event $A$.

Example 2.1 (Die throw) Roll a six-faced die and observe the score on the uppermost face. Here $S = \{1, 2, 3, 4, 5, 6\}$, which is composed of six elementary events.

The union of two events $A$ and $B$, written as $A \cup B$, is the set of outcomes that are in $A$, $B$, or both. “$A \cup B$ occurs” means “either $A$ or $B$ or both occur”.

For example, in Example 2.1, let $A$ be the event “an even number is observed”: $A = \{2, 4, 6\}$. Let $B$ be “a number larger than 3 is observed”: $B = \{4, 5, 6\}$. Then $A \cup B = \{2, 4, 5, 6\}$. If a $6$ is observed, both $A$ and $B$ occur.

The intersection of two events $A$ and $B$, written as $A \cap B$, is the set of outcomes common to both. “$A \cap B$ occurs” means “both $A$ and $B$ occur”. In Example 2.1, $A \cap B = \{4, 6\}$. If $C = \{1, 2, 3, 4, 5\}$ (“a number less than 6”), then $A \cap C = \{2, 4\}$ (“an even number less than 6”).

We can generalize union and intersection to more than two events.

Two events $A$ and $D$ are mutually exclusive if $A \cap D = \emptyset$, meaning they have no outcomes in common. In other words, $A$ and $D$ cannot happen at the same time.

In Example 2.1, if $D = \{1, 3, 5\}$ (“an odd number”), then $A \cap D = \emptyset$, so $A$ and $D$ are mutually exclusive.

The complement of an event $A$, written $A^\prime$, is the set of all outcomes not in $A$. Note that $A \cup A^\prime = S$ and $A \cap A^\prime = \emptyset$.

2.3 Axioms of probability

Probability is based on three main axioms:

A1 $P\{S\}=1$.
A2 $0 \leq P\{A\} \leq 1$ for any event $A$.
A3 $P\{A \cup B\}=P\{A\}+P\{B\}$ if $A$ and $B$ are mutually exclusive.

These axioms have several important consequences:

For any event $A$, $P\{A\} = 1 - P\left\{A^{\prime}\right\}$.
From (1) and Axiom A1, $P\{\emptyset\} = 1 - P\{S\} = 0$. So if $A$ and $B$ are mutually exclusive, $P\{A \cap B\} = 0$.
If $D$ is a subset of $E$, $D \subset E$, then $P\left\{E \cap D^{\prime}\right\} = P\{E\} - P\{D\}$. For any events $A$ and $B$, $P\left\{A \cap B^{\prime}\right\} = P\{A\} - P\{A \cap B\}$.
Axiom A3 extends to more than two mutually exclusive events: \[ P\left\{A_{1} \cup A_{2} \cup \cdots \cup A_{k}\right\}=P\left\{A_{1}\right\}+P\left\{A_{2}\right\}+\ldots+P\left\{A_{k}\right\} \] if $A_{1}, \ldots, A_{k}$ are mutually exclusive. So, the probability of an event $A$ is the sum of the probabilities of the individual outcomes in $A$.
For any two events $A$ and $B$, the general addition rule is: \[ P\{A \cup B\}=P\{A\}+P\{B\}-P\{A \cap B\} . \]

Proof. We can write $A \cup B=\left(A \cap B^{\prime}\right) \cup(A \cap B) \cup\left(A^{\prime} \cap B\right)$. All three of these are mutually exclusive events. Hence, \[ \begin{aligned} P\{A \cup B\} &=P\left\{A \cap B^{\prime}\right\}+P\{A \cap B\}+P\left\{A^{\prime} \cap B\right\} \\ &=P\{A\}-P\{A \cap B\}+P\{A \cap B\}+P\{B\}-P\{A \cap B\} \\ &=P\{A\}+P\{B\}-P\{A \cap B\}. \end{aligned} \]

The sum of the probabilities of all the outcomes in the sample space $S$ is 1 .

2.4 Using combinatorics to find probabilities

2.4.1 Experiments with equally likely outcomes

If all outcomes are equally likely, we can find the probability of an event by counting how many outcomes are in the event and dividing by the total number of possible outcomes.

For an experiment with $N$ equally likely outcomes, each outcome has probability $1/N$.

For any event $A$, \[P \{A\} = \frac{\text{number of outcomes in $A$}} {\text{total number of possible outcomes of the experiment}}.\]

So, to calculate the probability of an event, we just count the number of outcomes in the event and the total number of possible outcomes. In the next sections, we use combinatorics (the mathematics of counting) to help with this.

Return to Example 2.1 where a six-faced die is rolled. Suppose that one wins a bet if a 6 is rolled. Then the probability of winning the bet is $1/6$ as there are six possible outcomes in the sample space and exactly one of those, 6, wins the bet. Suppose $A$ denotes the event that an even-numbered face is rolled. Then $P\{A\} = 3/6 = 1/2$ as we can expect.

Example 2.2 (Dice throw) Roll 2 distinguishable dice and observe the scores. Here \[S = \{(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), . . . , (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)\}\] which consists of 36 possible outcomes or elementary events, $A_1, \ldots , A_{36}$. What is the probability of the outcome $6$ in both the dice? The required probability is $1/36$. What is the probability that the sum of the two dice is greater than 6? How about the probability that the sum is less than any number, e.g. 8?

Hint: Write down the sum for each of the 36 outcomes and then find the probabilities asked just by inspection. Remember, each of the 36 outcomes has equal probability $1/36$.

2.4.2 Multiplication rule of counting

Suppose you have to complete $k$ tasks in sequence. If there are $n_i$ ways to do the $i$th task, then there are $n_1 \times n_2 \times \dots \times n_k$ ways to do all $k$ tasks.

Example 2.3 (Bus routes) Suppose there are 7 routes to London from Southampton and then there are 5 routes to Cambridge out of London. How many ways can I travel to Cambridge from Southampton via London? The answer is obviously 35.

2.4.3 The number of permutations of $k$ from $n$: $P(n, k)$

Suppose we want to select $k$ people from $n$ and arrange them in $k$ different chairs. How many ways can we do this?

Think of the $i$th task as choosing who sits in the $i$th chair. By the multiplication rule, there are $n(n-1)\ldots(n-[k-1])$ ways.

This number is called the number of permutations of $k$ from $n$, written $P(n, k)$: \[P(n, k) = n(n - 1) \ldots (n - [k - 1]).\] When $k = n$, $P(n, n) = n!$ (n factorial). By definition, $0! = 1$. We can also write \[P(n, k) = \frac{n!}{(n-k)!}.\]

Example 2.4 (Football) How many possible rankings are there for the 20 football teams in the premier league at the end of a season? This number is given by $P(20, 20) = 20!$, which is a huge number! How many possible permutations are there for the top 4 positions who will qualify to play in Europe in the next season? This number is given by $P(20, 4) = 20 \times 19 \times 18 \times 17$.

2.4.3.1 The number of combinations of $k$ from $n$: $\binom{n}{k}$

Now suppose we want to select $k$ people from $n$, but we do not care about the order. How many ways can we do this? This is called a combination, written $\binom{n}{k}$.

To see this, recall that arranging $k$ people in $k$ chairs is $P(n, k)$ ways. But for each group of $k$ people, there are $k!$ ways to arrange them. So, \[\binom{n}{k} = \frac{P(n, k)}{k!} = \frac{n!}{(n-k)!k!}.\]

Example 2.5 (Football) How many possible ways are there to choose 3 teams for the bottom positions of the premier league table at the end of a season? This number is given by $\binom{20}{3} = 20 \times 19 \times 18/3!$, which does not take into consideration the rankings of the three bottom teams.

Example 2.6 (Microchips) A box contains 12 microchips of which 4 are faulty. A sample of size 3 is drawn from the box without replacement.

How many selections of 3 can be made? $\binom{12}{3}$
How many samples have all 3 chips faulty? $\binom{4}{3}$.
How many selections have exactly 2 faulty chips? $\binom{8}{1} \binom{4}{2}$
How many samples of 3 have 2 or more faulty chips? $\binom{8}{1} \binom{4}{2} + \binom{4}{3}$

2.4.4 Calculation of probabilities of events under sampling ‘at random’

When we select a sample of size $n$ from a box of $N$ items without replacement, we say the sample is chosen “at random” if every possible sample of size $n$ is equally likely. In this case, each possible sample is an equally likely outcome, and we assign equal probabilities.

Example 2.7 (Microchips continued) In Example 2.6 assume that 3 microchips are selected at random without replacement. Then

each outcome (sample of size $3$) has probability $1/\binom{12}{3}$.
$P\{\text{all 3 selected microchips are faulty}\} = \binom{4}{3}/ \binom{12}{3}$.
$P\{\text{2 chips are faulty}\} = \binom{8}{1} \binom{4}{2}/ \binom{12}{3}$.
$P\{\text{2 or more chips are faulty}\} = \big(\binom{8}{1} \binom{4}{2} + \binom{4}{3}\big)/\binom{12}{3}.$

2.4.5 A general ‘urn problem’

Example 2.6 is one particular case of the following general urn problem which can be solved by the same technique. A sample of size $n$ is drawn at random without replacement from a box of $N$ items containing a proportion $p$ of defective items.

How many defective items are in the box? $N p$. How many good items are there? $N (1 - p)$. Assume these to be integers.
The probability of exactly $x$ defective items in the sample of $n$ items is \[\frac{\binom{Np}{x} \binom{N(1-p)}{n-x}}{\binom{N}{n}}.\]
Which values of $x$ (in terms of $N$, $n$ and $p$) make this expression well defined? We’ll see later that these values of $x$ and the corresponding probabilities make up what is called the hyper-geometric distribution.

Example 2.8 (Selecting a committee) There are 10 students available for a committee of which 4 are men and 6 are women. A random sample of 3 students are chosen to form the committee — what is the probability that exactly one is a man? The total number of possible outcomes of the experiment is equal to the number of ways of selecting 3 students from 10 and is given by $\binom{10}{3}$. The number of outcomes in the event ‘exactly one is a man’ is equal to the number of ways of selecting 3 students from 10 with exactly one man, and given by $\binom{4}{1} \binom{6}{2}$ Hence \[\begin{align*} P \{\text{exactly one man}\} &= \frac{\text{number of ways of selecting one man and two women}} {\text{number of ways of selecting 3 students}} \\ &= \frac{\binom{4}{1} \binom{6}{2}}{\binom{10}{3}} = \frac{4 \times 15}{120} = \frac{1}{2} \end{align*}\]

Example 2.9 (The National Lottery) In Lotto, a winning ticket has six numbers from 1 to 59 matching those on the balls drawn on a Wednesday or Saturday evening. The ‘experiment’ consists of drawing the balls from a box containing 59 balls. The ‘randomness’, the equal chance of any set of six numbers being drawn, is ensured by the spinning machine, which rotates the balls during the selection process. What is the probability of winning the jackpot? The total number of possible selections of six balls/numbers is $\binom{59}{6}$ There is only 1 selection for winning the jackpot. Hence \[P\{\text{jackpot}\} = \frac{1}{\binom{59}{6}} = 2.22 \times 10^{-8},\] which is roughly 1 in 45 million.

There is one other way of win a very large prize, of £1 million, by using the bonus ball — matching 5 of the selected 6 balls plus matching the bonus ball. The probability of this is given by \[P \{\text{$5$ matches + bonus}\} = \frac{6}{\binom{59}{6}} = 1.33 \times 10^{-7} .\]

Other smaller prizes are given for fewer matches. The corresponding probabilities are: \[\begin{align*} P \{\text{$5$ matches}\} &= \frac{\binom{6}{5} \binom{53}{1}}{\binom{59}{6}} = 7.06 \times 10^{-6}. \\ P \{\text{$4$ matches}\} &= \frac{\binom{6}{4} \binom{53}{2}}{\binom{59}{6}} = 0.000459.\\ P \{\text{$3$ matches}\} &=\frac{\binom{6}{3} \binom{53}{3}}{\binom{59}{6}} = 0.0104. \end{align*}\]

2.5 Conditional probability and Bayes’ Theorem

How do we update probabilities when we have new information? For example, a person may have a certain disease, whether or not they show symptoms. If we know a randomly selected person has the symptom, what is the probability they have the disease? Having the symptom does not guarantee the disease.

Conditional probability is common in fields like actuarial science and medicine. For example, “What is the probability a person will survive another 20 years, given they are alive at age 40?” In many real problems, we want to find the probability of event $A$ when we already know that event $B$ has happened. For this, we use conditional probability.

Example 2.10 (Die throw continued) Return to the rolling of a fair die (Example 2.1). Let \[\begin{align*} A = \{\text{a number greater than $3$}\} = \{4, 5, 6\}, B = \{\text{an even number}\} = \{2, 4, 6\}. \end{align*}\] It is clear that $P \{B\} = 3/6 = 1/2$. This is the unconditional probability of the event $B$. It is sometimes called the prior probability of $B$.

However, suppose that we are told that the event $A$ has already occurred. What is the probability of $B$ now given that $A$ has already happened?

The sample space of the experiment is $S = \{1, 2, 3, 4, 5, 6\}$, which contains $n = 6$ equally likely outcomes.

Given the partial knowledge that event $A$ has occurred, only the $n_A = 3$ outcomes in $A = \{4, 5, 6\}$ could have occurred. However, only some of the outcomes in $B$ among these $n_A$ outcomes in $A$ will make event $B$ occur; the number of such outcomes is given by the number of outcomes $n_{A\cap B}$ in both $A$ and $B$, i.e., $A \cap B$, and equal to $2$. Hence the probability of $B$, given the partial knowledge that event $A$ has occurred, is equal to \[\frac{2}{3} = \frac{n_{A \cap B}}{n_A} = \frac{n_{A\cap B} / n}{n_A / n} = \frac{P\{A \cap B\}}{P\{A\}}\] Hence we say that $P \{B|A\} = 2/3$, which is often interpreted as the posterior probability of $B$ given $A$. The additional knowledge that $A$ has already occurred has helped us to revise the prior probability of $1/2$ to $2/3$.

This simple example leads to the following general definition of conditional probability.

2.5.1 Definition of conditional probability

For events $A$ and $B$ with $P \{A\} > 0$, the conditional probability of event $B$, given that event $A$ has occurred, is \[P \{B|A\} = \frac{P \{A \cap B\}}{P \{A\}}.\]

Example 2.11 Of all individuals buying a mobile phone, 60% include a 64 GB hard disk in their purchase, 40% include a 16 MP camera and 30% include both. If a randomly selected purchase includes a 16 MP camera, what is the probability that a 64GB hard disk is also included? The conditional probability is given by \[P \{\text{$64$ GB}|\text{$16$ MP}\} = \frac{P \{\text{$64$ GB} \cap \text{$16$ MP}\}}{P\{\text{$16$ MP}\}} = \frac{0.3}{0.4} = 0.75.\]

2.5.2 Multiplication rule of conditional probability

By rearranging the conditional probability definition, we obtain the multiplication rule of conditional probability: \[P\{A \cap B\} = P\{A\} P\{B|A\}.\] Clearly the roles of A and B could be interchanged: \[P\{A \cap B\} = P\{B\} P\{A|B\}.\] Hence the multiplication rule of conditional probability for two events is \[P\{A \cap B\} = P\{B\}P\{A|B\} = P\{A\}P\{B|A\}.\]

It is straightforward to show by mathematical induction the following multiplication rule of conditional probability for $k(\geq 2)$ events $A_1 , A_2 , \ldots , A_k$: \[P \{A_1 \cap A_2 \cap \ldots \cap A_k\} = P \{A_1\}P \{A_2 |A_1\} P\{A_3 |A_1 \cap A_2\} \ldots P\{A_k |A_1 \cap A_2 \cap \ldots \cap A_{k-1}\}.\]

Example 2.12 (Selecting a committee continued) Return to the committee selection example (Example 2.8), where there are 4 men (M) and 6 women (W). We want to select a 2-person committee. Find:

the probability that both are men,
the probability that one is a man and the other is a woman.

We have already dealt with this type of urn problem by using the combinatorial method. Here, the multiplication rule is used instead. Let $M_i$ be the event that the $i$th person is a man, and $W_i$ be the event that the $i$th person is a woman, $i = 1, 2$. Then \[\text{Prob in (i)} = P \{M_1 \cap M_2 \} = P \{M_1\}P \{M_2 |M_1\} = \frac{4}{10} \times \frac{3}{9},\] \[\begin{align*} \text{Prob in (ii)} &= P \{M_1 \cap W_2 \}+P \{W_1 \cap M_2 \} \\ &= P \{M_1\}P \{W_2 |M_1 \}+P \{W_1\}P \{M_2 |W_1\} \\ &= \frac{4}{10} \times \frac{6}{9} + \frac{6}{10} \times \frac{4}{9} \end{align*}\]

You can find the probability that ‘both are women’ in a similar way.

2.5.3 Total probability formula

Example 2.13 (Phones) Suppose that in our world there are only three phone manufacturing companies: A Pale, B Sung and C Windows, and their market shares are respectively 30, 40 and 30 percent. Suppose also that respectively 5, 8, and 10 percent of their phones become faulty within one year. If I buy a phone randomly (ignoring the manufacturer), what is the probability that my phone will develop a fault within one year? After finding the probability, suppose that my phone developed a fault in the first year — what is the probability that it was made by A Pale?

Company	Market share	Percent defective
A Pale	30%	5%
B Sung	40%	8%
C Windows	30%	10%

To answer this type of question, we derive two of the most useful results in probability theory: the total probability formula and Bayes’ theorem. First, let us derive the total probability formula.

Let $B_1, B_2, \ldots , B_k$ be a set of mutually exclusive, i.e. \[B_i \cap B_j = \emptyset, \text{for all $1 \leq i \not = j \leq k$.}\] and exhaustive events, i.e.: \[B_1 \cup B_2 \cup \ldots \cup B_k = S.\] Now any event $A$ can be represented by \[A = A \cap S = (A \cap B_1) \cup (A \cap B_2) \cup \ldots \cup (A \cap B_k)\] where $(A \cap B_1), (A \cap B_2), \ldots , (A \cap B_k)$ are mutually exclusive events. Hence the Axiom A3 of probability gives \[\begin{align*} P \{A\} &= P \{A \cap B_1 \} + P \{A \cap B_2 \} + . . . + P \{A \cap B_k \} \\ &= P \{B_1 \}P \{A|B_1 \} + P \{B_2 \}P \{A|B_2 \} + . . . + P \{B_k \}P \{A|B_k \}. \end{align*}\] This last expression is called the total probability formula for $P \{A\}$.

Example 2.14 (Phones continued) We can now find the probability of the event, say $A$, that a randomly selected phone develops a fault within one year. Let $B_1, B_2, B_3$ be the events that the phone is manufactured respectively by companies A Pale, B Sung and C Windows. Then we have: \[\begin{align*} P \{A\} &= P \{B_1 \}P \{A|B_1 \} + P \{B_2 \}P \{A|B_2 \} + P \{B_3 \}P \{A|B_3 \} \\ &= 0.30 \times 0.05 + 0.40 \times 0.08 + 0.30 \times 0.10 \\ &= 0.077. \end{align*}\]

Now suppose that my phone has developed a fault within one year. What is the probability that it was manufactured by A Pale? To answer this we need to introduce Bayes’ Theorem.

2.5.4 Bayes’ theorem

Theorem 2.1 (Bayes' Theorem) Let $A$ and $B$ be events. Then \[P \{B |A\} = \frac{P \{B \}P \{A|B \}}{P \{A\}}.\]

Proof. From the definition of conditional probability, we have \[P \{B |A\} = \frac{P \{B \cap A\}}{P \{A\}} = \frac{P \{B \}P \{A|B \}}{P \{A\}}.\]

The probability $P \{B |A\}$ is called the posterior probability of $B$ given $A$ and $P \{B \}$ is called the prior probability. Bayes’ theorem is the rule that converts the prior probability into the posterior probability by using the additional information that some other event, $A$ above, has already occurred.

Example 2.15 (Phones continued) The probability that my faulty phone was manufactured by A Pale is \[P \{B_1 |A\} = \frac{P \{B_1 \}P \{A|B_1 \}}{P \{A\}} = \frac{0.30 \times 0.05}{0.077} = 0.1948.\] Similarly, the probability that the faulty phone was manufactured by B Sung is $0.4156$, and the probability that it was manufactured by C Windows is $1-0.1948-0.4156 = 0.3896$.

As in this example, we usually need to use the total probability formula to calculate the denominator $P(A)$ in Bayes’ theorem.

2.6 Independent events

2.6.1 Introduction and definition of independence

Sometimes, knowing that one event has happened changes the probability of another event. But in many cases, it does not. When the probability does not change, we say the events are independent.

Intuitively, events $A$ and $B$ are independent if knowing that one occurs does not affect the probability of the other. In other words, $P \{B|A\} = P \{B\}$ (if $P \{A\} > 0$), and $P \{A|B\} = P \{A\}$ (if $P \{B\} > 0$).

Formally, $A$ and $B$ are independent if $P \{A \cap B\} = P \{A\}P \{B\}$.

Example 2.16 (Die throw) Throw a fair die. Let $A$ be the event that “the result is even” and $B$ be the event that “the result is greater than 3”. We want to show that $A$ and $B$ are not independent.

For this, we have $P \{A \cap B\} = P \{\text{either a $4$ or $6$ thrown}\} = 1/3$, but $P \{A\} = 1/2$ and $P \{B\} = 1/2$, so that $P \{A\}P \{B\} = 1/4 \not = 1/3 = P \{A \cap B\}$. Therefore $A$ and $B$ are not independent events.

Independence is often assumed for physical reasons, but sometimes incorrectly. Wrongly assuming independence can have serious consequences (for example, in the 2008 financial crisis). If events are independent, we can use the simple product formula for joint probability.

Example 2.17 (Dice throw) Two fair dice when thrown together are assumed to behave independently. Hence the probability of two sixes is $1/6 \times 1/6 = 1/36$.

Example 2.18 (Assessing risk in legal cases) There have been some disastrous miscarriages of justice as a result of incorrect assumption of independence. Please read “Incorrect use of independence — Sally Clark Case” on Blackboard.

Theorem 2.2 (Independence of complementary events) If $A$ and $B$ are independent, so are $A^\prime$ and $B^\prime$.

Proof. Given that $P \{A \cap B\} = P \{A\}P \{B\}$, we need to show that $P \{A^\prime \cap B^\prime \} = P \{A^\prime \}P \{B^\prime \}$. We have \[\begin{align*} P \{A^\prime \cap B^\prime \} &= 1 - P \{A \cup B\} \\ &= 1 - [P \{A\} + P \{B\} - P \{A \cap B\}] \\ &= 1 - [P \{A\} + P \{B\} - P \{A\}P \{B\}] \\ &= [1 - P \{A\}] - P \{B\}[1 - P \{A\}] \\ & = [1 - P \{A\}][1 - P \{B\}] \\ &= P \{A^\prime \}P \{B^\prime \} \end{align*}\]

2.6.2 Independence with three events

We can extend the ideas of conditional probability and independence to more than two events.

Three events $A$, $B$, and $C$ are independent if: \[\begin{equation} P \{A \cap B\} = P \{A\}P \{B\}, \; P \{A \cap C\} = P \{A\}P \{C\}, \; P \{B \cap C\} = P \{B\}P \{C\}, \tag{2.1} \end{equation}\] \[\begin{equation} P \{A \cap B \cap C\} = P \{A\}P \{B\}P \{C\}. \tag{2.2} \end{equation}\]

Note: Pairwise independence ((2.1)) does NOT imply three-way independence ((2.2)). To show $A$, $B$, and $C$ are independent, both conditions must hold.

Example 2.19 A box contains eight tickets, each labelled with a binary number. Two are labelled with the binary number $111$, two are labelled with $100$, two with $010$ and two with $001$. An experiment consists of drawing one ticket at random from the box. Let $A$ be the event “the first digit is 1”, $B$ the event “the second digit is 1” and $C$ be the event “the third digit is 1”. It is clear that $P \{A\} = P \{B\} = P \{C\} = 4/8 = 1/2$ and $P \{A \cap B\} = P \{A \cap C\} = P \{B \cap C\} = 1/4$, so the events are pairwise independent, i.e. (2.1) holds. However $P \{A \cap B \cap C\} = 2/8 \not = P \{A\}P \{B\}P \{C\} = 1/8$. So (2.2) does not hold and $A$, $B$ and $C$ are not independent.

MATH1063: Introduction to Statistics