0. Logistical Info
- Section date: 9/27
- Associated lectures: 9/19, 9/21
- Associated pset: Pset 3, due 9/29
- Office hours on 9/27 from 7-9pm at Quincy Dining Hall
- Remember to fill out the attendance form
- Scroll to section 4 for a concise content summary.
0.1 Summary + Practice Problem PDFs
Summary + Practice Problems PDF
Practice Problem Solutions PDF
1. Examples from class
The lecture on 9/19 was full of examples of conditional probability. Here are my takeaways for each. I did not rewrite the examples because solutions are quite well-written in the book and/or in lecture.
1.1 Winter girl (Examples 2.2.5-2.2.7 in the book)
- Define your events as specifically as possible, there are a lot of details that surprisingly can change probabilities.
- See if you can simplify your events (both mathematically and logically). For example, if $A$ is the event that there’s at least one girl and $B$ is the event that there are two girls, $P(A, B) = P(B)$ since if there are two girls, there is automatically at least one girls.
1.2 Monty Hall (Example 2.7.1 and many practice problems in the book)
You can use the Law of Total Probability in some extreme ways! Condition on things that make your life much, much easier - in Monty Hall problems (and the variants that Joe likes to write), I very often condition on the location of the car or use Bayes’ rule to move information about the car’s location into the condition!
1.3 Simpson’s paradox (Example 2.8.3 in the book)
I think it’s a good to develop the skill of coming up with similar paradoxes - at the very least, it can test your understanding of probability. My understanding of this phenomenon is that there are two tasks - a hard and an easy task. Doctor A might have a better success rate in each task, but Doctor B can still have a higher overall success rate.
This happens because Doctor A does more of the harder task, which drags their average down, while Doctor B inflates their average by doing the easier task more often. There are some other intuitive corollaries - like how some students may learn a lot but have lower GPAs then other students because they take a higher proportion of challenging classes.
To construct these paradoxes, I think you need a hard task (where both doctors have a “low” success rate) and an easy task (where both doctors have a higher success rate). Then doctor A has to do more of the hard task, while doctor B needs to do more of the easy task, to weight their averages differently.
1.4 Gambler’s ruin (Example 2.7.3 in the book)
You will be assessed (in pset and/or exam) on your ability to apply the gambler’s ruin result in other contexts, but you will never have to re-derive it. So you can figure out how variables in your problem correspond to gambler’s ruin (or even set up the difference equation) then just jump to plugging in the solution given.
In the gambler’s ruin problem, gambler $A$ starts with $i$ dollars and gambler $B$ starts with $N-i$ dollars. They keep making 1 dollar bets (which gambler $A$ has a probability $p$ of winning) until someone runs out of money (either gambler $A$ has $0$ dollars or gambler $B$ has 0 dollars). We often define $q = 1-p$ for notational convenience.
If we define $p_i$ to be the probability that gambler $A$ wins if they start with $i$ dollars, then using first-step analysis we find that $$ p_i = p_{i+1} p + p_{i-1}q. $$ This gives a difference equation solution of $$ p_i = \begin{cases} \frac{1 - (\frac{q}{p})^i}{1 - (\frac{q}{p})^N}& p \ne 1/2\\ \frac{i}{N}& p = 1/2 \end{cases} $$
To match a problem to this, you should
- Make sure that “bets” are worth 1 dollar each.
- Make sure that gambler $A$ loses if they hit 0 dollars, and wins if they hit some fixed amount of dollars ($N$)
- Make sure there is a constant probability of winning each bet.
2. Random variables
2.1 Definition
A random variable summarizes some experiment. So if you have a sample space $S$, for each possible outcome $\omega \in S$, your random variable takes on a certain (real number). Here are some examples of random variables:
- Say we’re rolling a die, so $S$ is the set of possible rolls. Then we could have $X = 1$ if we roll a $1$, $X = 2$ if we roll a $2$, and so on.
- We could define another random variable $Y$ to be the square of the roll. so $Y = 1$ if we roll a 1, $Y = 16$ if we roll a $4$, etc.
- Random variables don’t have to take on different values for every outcome. So we could have $Z = 2$ if we roll an even number and $Z = 1$ if we roll an odd number.
- They also don’t have to be discrete values - we could have a random variable $T$ represent the exact temperature in the room right now.
2.2 Defining discrete random variables
A random variable is discrete if it has a finite or countably infinite number of values (something like $1, 2, 3, 4, \ldots$ is countably infinite. If instead your random variable can take on any value in an interval - like any real number, or any real number between 2 and 4, etc. - then it is uncountable).
We can uniquely describe a random variable by its probability mass function. This basically tells us the probability that the random variable takes on each possible value. For example, the probability mass function for the first dice-roll example, $X$ is $$ P(X = x) = \begin{cases} \frac{1}{6} & x \in \{1, 2, 3, 4, 5, 6\}\\ 0 & \text{else}. \end{cases} $$ The little $x$ is a dummy variable - it’s just a general way of going through every possible value. For example, we can now tell that there is a $1/6$ probability that $X = 4$, which corresponds to a $1/6$ probability that we roll a $4$, which makes sense.
Some important facts:
- “$X = 1$” is an event, so we can take its probability. We cannot take the probability of a random variable, so $P(X)$ is a category error.
- When writing a PMF
- Every probability should be between $0$ and $1$ (inclusive), which holds for all probabilities.
- The probabilities in the PMF should sum to $1$: $$ \sum_{x \in \mathbb{R}} P(X = x) = 1. $$ This comes from both axioms of probability, since the events for each possible value of $X$ partition the entire sample space.
- I always have the “$0$, else” statement.
- The support of a random variable is the set of possible values it can take on. So the support for $X$ is ${1, 2, 3, 4, 5, 6}$, and the support for $Z$ is ${1, 2}$, and so on. You should always define the support, too.
2.3 A more mathy definition of random variables
Random variables are functions. If notation makes more sense to you, maybe this will be useful.
Remember that in any random experiment, we have a sample space $S$, where each element of $S$ is a possible outcome. Exactly one of those outcomes will happen. You can think of a random variable $X$ as a function that maps outcomes to the real number line, so $X : S \to \mathbb{R}$. So if some outcome $\omega \in S$ happens, then the random variable gives us the real number $X(\omega)$. There’s nothing random about the function $X$ - each outcome always goes to the same real number - but instead the randomness comes from which outcome actually ends up happening.
So when we think about PMFs and any probabilities with random variables, we’re using a bit of shorthand. It’s not immediately obvious that $X = x$ is an event, so let’s translate (again, remember $x$ is just a fixed number. I could just as easily use $y$ or $a$ as a variable, or $0$ or $1.5$ or $-17$ if I want a specific value): \begin{align} P(X = x) = P(\{\omega \in S : X(\omega) = x\}) \end{align} That set, $\{\omega \in S : X(\omega) = x\}$, is a subset of the sample space and thus is definitely an event. We can take a probability of that. Note that there can be multiple outcomes in this subset, which is the same as saying that $X$ is not injective.
So when we start talking about fancier random varibles - like $X^2$ - we can still dissect the probabilities \begin{align} P(X^2 = y) &= P(\{\omega \in S : X^2(\omega) = y\})\\ &= P(\{\omega \in S : X(\omega) = \sqrt{y} \text{ or } X(\omega) = -\sqrt{y}\})\\ &= P((X = \sqrt{y}) \cup (X = -\sqrt{y}))\\ &= P(X = \sqrt{y}) + P(X = -\sqrt{y}), \end{align} assuming $y$ is positive and splitting up the probability in the last step because the two events are disjoint (a random variable can’t equal two different values at the same time).
The support can also now be redefined as the function - basically, the support is $\{x \in \mathbb{R} : \text{ there exists } \omega \in S \text{ such that } X(\omega) = x\}$.
3. Distributions
A distribution is a type of random variable. I think this makes the most sense through example. For discrete distributions (which correspond to discrete random variables), we usually motivate them with a story and maybe some counting. Stories are extremely important and should be internalized, not just the PMFs.
3.1 Bernoulli Distribution
You perform an experiment that consists of $1$ trial, where the possible outcomes are success or failure. There is a probability $p$ of success and probability $q = 1-p$ of failure. This is summarized by a random variable $X$, where $X = 1$ if the trial is a success and $X = 0$ if it is a failure. We then say $X$ is distributed Bernoulli with parameter $p$, or in notation, $X \sim \mathrm{Bern}(p)$.
The PMF is $$ P(X = x) = \begin{cases} p & x = 1\\ q & x = 0\\ 0 & \text{else} \end{cases} $$
For a concrete example, a rigged coin toss has a probability $p = 0.7$ of landing heads (success) and probability $q = 0.3$ of landing tails (failure). If $X = 1$ when the coin lands heads and $X = 0$ when it lands tails, then $X \sim \mathrm{Bern}(0.7)$.
Note that you CANNOT set a random variable equal to a distribution. You have to use the $\sim$ symbol and say “distributed as.”
3.2 Binomial Distribution
You perform an experiment with $n$, independent Bernoulli trials, each of which is a success with the same probability $p$. Then the random variable $Y$, the number of successful trials, is distributed Binomial with $n$ trials and success probability $p$. In notation, $Y \sim \mathrm{Bin}(n, p)$.
We can find the PMF with some counting: \begin{align} P(Y = y) &= \begin{cases} \binom{n}{y} p^y (1-p)^{n-y} & y \in \{0, 1, 2, \ldots, n\}\\ 0 & \text{else}. \end{cases} \end{align}
NOTE: a Binomial random variable $Y$ can be represented as the sum of $n$ independent Bernoulli random variables, each with success probability $p$.
4. Summary
4.1 Examples from class
Some takeaways:
- Define events very specifically (winter girl)
- See if you can simplify your problems with logic, not just relying on grinding through math (winter girl)
- Use the Law of Total Probability to condition on anything/everything you wish you knew
- In Monty Hall, you can often condition on the location of the car!
- To mimic Simpson’s paradox, set up some “hard” and “easy” tasks and make the better doctor do more of the hard tasks
- Learn how to turn a problem into the gambler’s ruin problem:
- Make losing happen at $0$, and winning happen at some fixed $N$
- Make sure each bet/step is only one dollar in either direction
- Make sure the probability of winning each individual bet is constant
4.2 Random variables
Random variables are a numerical (real number) summary of the outcome of your experiment. So for each possible outcome, the random variable takes on a certain value. Multiple outcomes can lead to the same value of the random variable. The support of a random variable is the set of possible values it can take on.
We define discrete random variables by their probability mass function. You should define the probability $P(X = x)$ for each $x$ in the random variable’s support, and always write probability $0$ for any value of $x$ that is not in the support. The PMF should always give valid probabilities, and should sum to $1$.
4.3 Distributions
- Bernoulli distribution: you conduct a single trial that succeeds with probability $p$. $X = 1$ if the trial succeeds and $X = 0$ if it fails. Then $X \sim \mathrm{Bern}(p)$ and has a PMF $$ P(X = x) = \begin{cases} p & x = 1\\ 1 - p & x = 0\\ 0 & \text{else} \end{cases} $$
- Binomial distribution: you conducts $n$ independent trials that each succeed with probability $p$. $Y$ is the total number of successes among the $n$ trials. Then $Y \sim \mathrm{Bin}(n, p)$ and the PMF is $$ P(Y = y) = \begin{cases} \binom{n}{y} p^y (1-p)^{n-y} & y \in \{0, 1, \ldots, n\}\\ 0 & \text{else} \end{cases} $$