Week 2: Conditional Probability

0. Logistical Info

Section date: 9/20
Associated lectures: 9/12, 9/14
Associated pset: Pset 2, due 9/22
Office hours on 9/20 from 7-9pm at Quincy Dining Hall
Remember to fill out the attendance form
Scroll to section 5 for a concise content summary.

0.1 Summary + Practice Problem PDFs

Summary + Practice Problems PDF

Practice Problem Solutions PDF

1. Brushing up on the definition of probability

We’ll restate the axioms for the general definition of probability:

Definition of probability:
There are just two axioms (rules that probabilities have to follow):

$P (S) = 1, P (\emptyset) = 0.$
If events $A_{1}, A_{2}, \dots$ are disjoint, then $P (⋃_{j = 1}^{\infty} A_{j}) = \sum_{j = 1}^{\infty} P (A_{j}) .$

In other words, if $A_{1}, A_{2}, \dots$ partition some event $B$ , then $P (B) = \sum_{j = 1}^{\infty} P (A_{j})$ .

Tips for calculating probabilities:

Define events for every aspect of the problem (e.g., “ $A$ = the event that it rains tomorrow, $B$ = the event that it rained today”)
Write out the probabilities that you are given in the problem using notation (e.g., “ $P (A | B) = 1 / 2$ , $P (B) = 1 / 4$ ).
Write the probability that you want to calculate using notation (e.g., we want to calculate the unconditional probability that it rains tomorrow, $P (A)$ ).
Figure out how the tools we have learned allow you to utilize the probabiliies that you do know (step 2) to calculate the probabilities that you don’t know (step 3).

There are some important results that follow:

Probability of a complement: If $A$ is an event a sample space $S$ , $P (A) = 1 - P (A^{c}) .$ Concisely, the probability of an event occuring is $1$ minus the probability of the event not occuring.
Probability of a union: For events $A$ , $B$ , we have $P (A \cup B) = P (A) + P (B) - P (A \cap B) .$ It’s also useful to “disjointify” $A \cup B$ into a partition ( $A \cup B^{c}, A \cap B, A^{c} \cup B$ ) which allows us to use the second axiom and get $P (A \cup B) = P (A \cup B^{c}) + P (A \cap B) + P (A^{c} \cup B) .$
Principle of Inclusion-Exclusion (PIE): this is a general formula for the probability of the union of $n$ events $\begin{aligned} P (⋃_{i = 1}^{n} A_{i}) & = \sum_{i} P (A_{i}) - \sum_{i < j} P (A_{i} \cap A_{j}) \\ + \sum_{i < j < k} P (A_{i} \cap A_{j} \cap A_{k}) - \dots + (- 1)^{n + 1} P (⋂_{i = 1}^{n} A_{i}) . \end{aligned}$ Note that the formula for the probability of the union of two events is the $n = 2$ case of PIE.
A potential workflow (that you saw on Pset 1) for the probability of an intersection, $P (A_{1} \cap \dots \cap A_{n})$ , is to
- Use complementary counting and DeMorgan’s law (in that order) to turn the intersection into a union: $\begin{aligned} P (A_{1} \cap \dots \cap A_{n}) & = 1 - P ((A_{1} \cap \dots \cap A_{n})^{c}) \\ = 1 - P (A_{1}^{c} \cup \dots \cup A_{n}^{c}) \end{aligned}$
- Apply PIE to the union $P (A_{1}^{c} \cup \dots \cup A_{n}^{c})$

2. Conditional Probability

Notation note:

We will start writing $P (A \cap B)$ as $P (A, B)$ (i.e., commas between events and intersections are equivalent).

Conditional probability:

If $A$ and $B$ are two events, then the probability that $A$ occurs conditional on the fact that $B$ occurs (or given that $B$ occurs) is notated as $P (A | B)$ and equals $P (A | B) = \frac{P (A, B)}{P (B)} .$ All conditions go to the right of the bar symbol $|$ .

We read $P (A | B)$ is the “probability of $A$ given $B$ ” or “probability of $A$ conditioned on $B$ ” Intuitively, we can consider that if we know $B$ occurs, $B$ basically becomes our new sample space, so we take the probability that both $A$ and $B$ occurs, $P (A, B)$ , and rescale it by the probability that $B$ occurs, $P (B)$ .

We’re also quick to note that conditional probabilities are the same as “normal” probabilities — in fact, all probabilities can be considered conditional, we just treat some conditions more implicitly than others since they are more obvious/always involved to the problem. We’ll use extra conditioning to refer to problems where some conditions are always present (i.e., we never want to/don’t know how to calculate the probability of those conditionns). For example, to calculate the probability that it rains tomorrow ( $A$ ) given that it rained today $B$ , we would right $P (A | B)$ . However, we are implicitly conditioning on a lot of things: that the world exists tomorrow ( $W$ ), that I will be on Harvard campus when I check whether it rains $H$ , etc. So we could incorporate these extra conditions into our problem to write the definition of conditional probability with extra conditioning: $P (A | B, H, W) = \frac{P (A, B | H, W)}{P (B | H, W)}$ As you can see, when we want certain events to be extra conditions, they are conditions in every related probability we calculate. Each time we apply a formula for conditional probability, we have to choose whether to treat the each condition like $B$ (free to move around) or like $H$ (extra conditioning/always a condition).

3. Tools using Conditional Probability

If you ever need to solve a problem involving a sequence of things (like a game with many turns, or a random walk, or so on) and are stuck, try first-step analysis: conditioning what happens after the first step. You’ll often be able to get a recursive equation that is easier to solve.

3.1 Probability of an Intersection

For events $A, B$ we can rearrange the definition of conditional probability to find the probability of their intersection: $P (A, B) = P (A) P (B | A) = P (B) P (A | B) .$ This works for intersections of $n$ events: $P (A_{1}, A_{2}, \dots, A_{n}) = P (A_{i_{1}}) P (A_{i_{2}} | A_{i_{1}}) \dots P (A_{i_{n}} | A_{i_{1}}, A_{i_{2}}, \dots A_{i_{n - 1}})$ where $i_{1}, i_{2}, \dots, i_{n}$ is a permutation of $1, 2, \dots, n$ . Note that we can choose the conditions in any order we want, and pick the order to our best convenience — for example, if you only know the unconditional probability of $A_{8}$ , you should let $i_{1} = 8$ .

The probability of an intersection with extra conditioning is $P (A, B | C) = P (A | C) P (B | A, C) = P (B | C) P (A | B, C) .$

3.2 Law of Total Probability (LOTP)

The law of total probability (LOTP) is a clever rephrasing of the second axiom of probability (the probability of a partition is the sum of probabilities): if $A_{1}, A_{2}, \dots, A_{n}$ partition the sample space, then $\begin{aligned} P (B) & = P (B, A_{1}) + P (B, A_{2}) + \dots + P (B, A_{n}) . \end{aligned}$ See Figure 2.3 from Blizstein & Hwang below for a visualization:

Visual diagram of the law of total probability, showing an event split up into pieces by its intersections with a partition of the sample space.

We usually break down each of the terms using the result for the probability of an intersection section 3.1 $P (B) = P (B | A_{1}) P (A_{1}) + P (B | A_{2}) P (A_{2}) + \dots + P (B | A_{n}) P (A_{n})$ When you only know $B$ in terms of conditional probabilities, we can make those conditional probabilities appear through LOTP. As Joe likes to say, condition on what you wish you knew:

LOTP with extra conditioning is $P (B | C) = P (B | A_{1}, C) P (A_{1} | C) + P (B | A_{2}, C) P (A_{2} | C) + \dots + P (B | A_{n}, C) P (A_{n} | C)$

3.3 Bayes’ Rule

Bayes’ Rule is also a result of the formula for the probability of an intersection: $P (B | A) = \frac{P (A | B) P (B)}{P (A)} .$ The denominator often gets expanded out using LOTP (section 3.2), often with a partition containing $B$ like $P (B | A) = \frac{P (A | B) P (B)}{P (A | B) P (B) + P (A | B^{c}) P (B^{c})} .$ Bayes’ rule is used in situations where we don’t know how to calculate the probability of $B$ given $A$ , $P (B | A)$ , but know how to calculate the probability of $A$ given $B$ , $P (A | B)$ .

Bayes’ rule with extra conditioning is $P (B | A, C) = \frac{P (A | B, C) P (B | C)}{P (A | B, C) P (B | C) + P (A | B^{c}, C) P (B^{c} | C)}$

4. Independence

Independence for two events

Two events $A, B$ are independent if $P (A, B) = P (A) P (B)$

For $P (A), P (B) > 0$ , this either of the following as well: $\begin{aligned} P (A | B) & = P (A) or P (B | A) & = P (B) \end{aligned}$

Intuitively, independence means that information about $A$ (e.g., knowing whether $A$ occurs) gives us no information about $B$ . Some

Note that independence goes both ways — if $A$ is independent of $B$ , then $B$ is independent of $A$ .
If $A$ is independent of $B$ , then $A$ is independent of $B^{c}$ and $A^{c}$ is independent of $B^{c}$ .

Independence and disjointness are not the same! In fact, if

A, B

are disjoint, then if

A

occurs we know that

B

did not occur, so they are very dependent.

Independence for many events

A group of events $A_{1}, A_{2}, \dots, A_{n}$ , are independent if for any subset $A_{i_{1}}, A_{i_{2}}, \dots, A_{i_{k}}$ , $P (A_{i_{1}}, A_{i_{2}}, \dots, A_{i_{k}}) = P (A_{i_{1}}) P (A_{i_{2}}) \dots P (A_{i_{k}})$ Note that pairwise independence (e.g., showing that $P (A_{i}, A_{j}) = P (A_{i}) P (A_{j})$ for all $i, j$ ) is required, but not enough, to show that joint independence of all of the sets.

4.1 Conditional Independence

Conditional independence follows a similar formula: $A, B$ are conditionally independent given $C$ if $P (A, B | C) = P (A | C) P (B | C) .$ You can see how this is analogous to extra conditioning for the other results. Conditional independence of many events is similarly defined.

Conditional independence and independence are not the same thing, and one often exists without the other!

5. Summary

Notation note: see that we use commas and intersections interchangeably (i.e., $P (A, B, C) = P (A \cap B \cap C)$ ).

Tips for calculating probabilities:

Define events for every aspect of the problem (e.g., “ $A$ = the event that it rains tomorrow, $B$ = the event that it rained today”)
Write out the probabilities that you are given in the problem using notation (e.g., “ $P (A | B) = 1 / 2$ , $P (B) = 1 / 4$ ).
Write the probability that you want to calculate using notation (e.g., we want to calculate the unconditional probability that it rains tomorrow, $P (A)$ ).
Figure out how the tools we have learned allow you to utilize the probabiliies that you do know (step 2) to calculate the probabilities that you don’t know (step 3).

5.1 Definition of Probability

Axioms of probability:

With sample space $S$ , $\begin{aligned} P (S) & = 1 \\ P (\emptyset) & = 0. \end{aligned}$
For $A_{1}, A_{2}, \dots,$ that partition $B$ (this can be finite or infinite), $\begin{array}{r} P (B) = \sum_{j = 1}^{\infty} P (A_{j}) \end{array}$

Probability of a complement: For event $A$ , $P (A) = 1 - P (A^{c})$

Probability of a union: For events $A$ and $B$ , $\begin{aligned} P (A \cup B) & = P (A) + P (B) - P (A \cap B) \\ = P (A \cap B^{c}) + P (B \cap A^{c}) + P (A \cap B) \end{aligned}$

Principle of Inclusion-Exclusion: For events $A_{1}, \dots, A_{n}$ , $\begin{aligned} P (⋃_{i = 1}^{n} A_{i}) & = \sum_{i} P (A_{i}) - \sum_{i < j} P (A_{i} \cap A_{j}) \\ + \sum_{i < j < k} P (A_{i} \cap A_{j} \cap A_{k}) - \dots + (- 1)^{n + 1} P (⋂_{i = 1}^{n} A_{i}) . \end{aligned}$

5.2 Conditional Probability

Conditional probability: For events $A$ and $B$ , the probability of $A$ given $B$ (i.e., given that $B$ occured) is $\begin{array}{r} P (A | B) = \frac{P (A \cap B)}{P (B)} . \end{array}$ …with extra conditioning: $\begin{array}{r} P (A | B, C) = \frac{P (A \cap B | C)}{P (B | C)} . \end{array}$

5.3 Conditional Probability Tools

First-step analysis: If you ever need to solve a problem involving a sequence of things (like a game with many turns, or a random walk, or so on) and are stuck, try first-step analysis: conditioning what happens after the first step. You’ll often be able to get a recursive equation that is easier to solve.

Probability of an intersection: $\begin{aligned} P (A_{1}, A_{2}, \dots A_{n}) & = P (A_{1}) P (A_{2} | A_{1}) \dots P (A_{n} | A_{1}, \dots, A_{n - 1}) \\ = P (A_{n}) P (A_{n - 1} | A_{n}) \dots P (A_{1} | A_{2}, \dots, A_{n}), \\ = [chaining in any order that is convenient for you] . \end{aligned}$ …with extra conditioning: $\begin{aligned} P (A_{1}, A_{2}, \dots A_{n} | C) & = P (A_{1} | C) P (A_{2} | A_{1}, C) \dots P (A_{n} | A_{1}, \dots, A_{n - 1}, C) \end{aligned}$ Law of Total Probability (LOTP): for events $A_{1}, A_{2}, \dots, A_{n}$ that partition $S$ , we can find $P (B)$ by $\begin{aligned} P (B) & = P (B, A_{1}) + P (B, A_{2}) + \dots + P (B, A_{n}) \\ = P (B | A_{1}) P (A_{1}) + P (B | A_{2}) P (A_{2}) + \dots + P (B | A_{n}) P (A_{n}) . \end{aligned}$ We pick $A_{1}, A_{2}, \dots, A_{n}$ to “condition on what we wish we knew.” These are situations where you don’t know $P (B)$ , but you know $P (B | A_{1}), (B | A_{2})$ , etc.

…with extra conditioning: $\begin{aligned} P (B | C) & = P (B, A_{1} | C) + P (B, A_{2} | C) + \dots + P (B, A_{n} | C) \\ = P (B | A_{1}, C) P (A_{1} | C) + P (B | A_{2}, C) P (A_{2} | C) + \dots + P (B | A_{n}, C) P (A_{n} | C) . \end{aligned}$ Bayes’ Rule: for events $A, B$ , if we want to calculate $P (B | A)$ but can only know how to calculate $P (A | B)$ , $\begin{aligned} P (B | A) & = \frac{P (A | B) P (B)}{P (A)} \\ = \frac{P (A | B) P (B)}{P (A | B) P (B) + P (A | B^{c}) P (B^{c})}, \end{aligned}$ where we commonly expand the denominator using the Law of Total Probability (LOTP).

…with extra conditioning: $\begin{aligned} P (B | A, C) & = \frac{P (A | B, C) P (B | C)}{P (A | C)} \\ = \frac{P (A | B, C) P (B | C)}{P (A | B, C) P (B | C) + P (A | B^{c}, C) P (B^{c} | C)} \end{aligned}$

5.4 Independence

Independence: $A, B$ are defined to be independent if $\begin{array}{r} P (A, B) = P (A) P (B) . \end{array}$ Note that if $A, B$ are independent, then so are $A, B^{c}$ and $A^{c}, B$ , and $A^{c}, B^{c}$ ; basically any functions of $A$ and $B$ are independent.

VERY IMPORTANT: Disjointness and independence are not the same thing, and disjoint events are in fact usually independent.

A set of events $A_{1}, A_{2}, \dots, A_{n}$ is independent if any subset of the events $A_{j_{1}}, \dots, A_{j_{k}}$ follows the equation. $\begin{aligned} P (A_{j_{1}}, \dots, A_{j_{k}}) & = P (A_{j_{1}}) \dots P (A_{j_{k}}) . \end{aligned}$ Basically, for any combination of independent events, we should be able to factor out the probabilities.

Conditional independence: $\begin{array}{r} P (A, B | C) = P (A | C) P (B | C) . \end{array}$ VERY IMPORTANT: Independence and conditional independence are not the same/do not imply each other. There is no guarantee that independent events are conditionally independent, or vice versa.