Week 4: Discrete Distributions and Expectation

0. Logistical Info

Section date: 10/4
Associated lectures: 9/24, 9/26, 10/3
Associated pset: Pset 4, due 10/6
Midterm: 10/10
Office hours on 10/4 from 7-9pm at Quincy Dining Hall
Exam office hours on 10/7 and 10/9 from 8-10pm at Quincy Dining Hall
Remember to fill out the attendance form
Given the structure of my section, I’m shifting away from a lot of explanation on this webpage. I may come back in the future and add more examples, but it doesn’t make much sense since we’re mainly doing practice problems in section. So this week, I have no concise summary section on the webpage because it’s all pretty tight - check out the handout below if you want it.

0.1 Summary + Practice Problem PDFs

1. Random variables

Let’s use the precise mathematical definition from last time: random variables assign real numbers to possible outcomes of an experiment. In other words, they map the sample space to the real line. So for a random variable $X$ , for every outcome in the sample space, $ω \in S$ , there is a corrsponding real number, $X (ω) \in R$ .

Here’s the terminology of a random variable that we’ve talked about thus far, where we continue using $X$ as an example of a random variable.

The support: what is the set of values that a random variable can take on? This is equivalent to the image/range of of $X$ on $S$ , $X (S)$ .
For a named distribution like a Binomial, we say $X$ is distributed Binomial using $X \sim Bin (n, p)$ , where we have to set possible values of the parameters $n$ and $p$ for our specific problem. You CANNOT set $X = Bin (n, p)$ : named distributions cannot equal random variables, they are just a blueprint for what the random variable looks like.
The probability mass function (PMF): for any real number $x$ (or $t$ or $y$ , it’s just a filler variable), what is the probability that $X$ takes on this value? This is notated $P (X = x)$ .
- You should address every possible value of $x$ : $P (X = x) = 0$ if $x$ is not in the support of $X$ , $\sum_{x \in support of X} P (X = x) = 1$ , and every probability should be valid (nonnegative, between $0$ and $1$ inclusive).
NEW: the cumulative density function (CDF): for any real number $x$ , what is the probability that $X$ takes on a value that is less than or equal to $x$ ? This is notated $P (X \leq x)$ .
- You should again address every possible value of $x$ , both in and outside of the support.
- Here, the requirements for a valid CDF are that $P (X \leq x) = 0$ if $x$ is less than the smallest value in the support and $P (X \leq x) = 1$ if $x$ is greater than the biggest value in the support. For an infinite support, we should have $P (X \leq x) \to 0$ as $x \to - \infty$ and $P (X \leq x) \to 1$ as $x \to \infty$ .
- Additionally, a CDF should be non-decreasing (i.e., either increasing or a flat line).
We often abbreviate to say random variables are independent and identically distributed (i.i.d.).

Here’s a general approach for defining the distribution of a random variable (r.v.). You can give the distribution using either the PMF, the CDF, or a named distribution with the parameters defined.

Define the support of your r.v.
See if the random variable matches the story of any of the named distributions we have discussed. To see if an r.v. matches a distribution, some things to check are
- For which named distributions is the support of your r.v. possible?
- Are there draws/samples/trials? If so, are they independent?
- If there is sampling, is it done with or without replacement?
If you can match a named distribution, what are the parameters? Are those parameters allowed for that named distribution?
If you can’t match a named distribution, how can you calculate the PMF using the information you checked about sampling and your counting skills?

2. Discrete distributions

You can find details like the support, PMF, CDF, expectation, and variance in the table of distributions on page 605 of the textbook or page 3 of the midterm handout. We’ll focus on the stories and connections between distributions. For these discrete random variables (except for the Poisson), you should develop comfort with calculating their PMFs from scratch.

2.1 Bernoulli

Story: We run a trial with probability $p$ of success. Let the random variable $X$ be $1$ if the trial succeeds or $0$ if the trial fails. Then $X \sim Bern (p)$ .

Connections:

For $X \sim Bern (p)$ , $1 - X \sim Bern (1 - p)$ .
For $X \sim Bern (p)$ , $X^{2} = X$ , so $X^{2} \sim Bern (p)$ . If you’re wondering why, check the support!

2.2 Binomial

Story: We run $n$ independent trials, each with an equal probability $p$ of success. Let $X$ be the number of successful trials. Then $X \sim Bin (n, p)$ .

Connections:

For $n$ independent and identically distributed Bernoulli random variables $X_{1}, \dots, X_{n} \overset{i . i . d .}{\sim} B e r n (p)$ , $\sum_{i = 1}^{n} X_{i} \sim Bin (n, p) .$
- This means $Bern (p)$ is equivalent to $Bin (1, p)$ .
For independent random variables $X \sim Bin (n, p)$ and $Y \sim Bin (m, p)$ , $X + Y \sim Bin (n + m, p) .$

2.3 Hypergeometric

Story:

Capture/recapture elk: There are $N$ elk in the forest. In the past, we captured and tagged $m$ of the elk. We now recapture $n$ of the elk, where every set of $n$ is equally likely and elk are sampled without replacement. Let $X$ be the number of tagged elk among our $n$ recaptured elk. Then $X \sim HGeom (m, N - m, n)$ .
White and black balls in an urn: There are $w$ white balls and $b$ black balls in a urn. We draw $n$ balls from the urn without replacement, where each set of $n$ balls is equally likely to be drawn. Let $X$ be the number of white balls in our sample. Then $X \sim HGeom (w, b, n)$ .

Connections:

Notice the comparison between the Binomial and the Hypergeometric: using the urn story, if we sampled with replacement our random variable would be distributed $Bin (n, \frac{w}{w + b})$ .

2.4 Geometric/First Success

Story: Suppose we’re running independent Bernoulli trials with probability $p$ of success. We stop running trials once one succeeds. Let $X$ be the number of failed trials before (and not including) the first successful trial. Then $X \sim Geom (p)$ .

Connections:

The First Success distribution is essentially the same as the Geometric, but we include the first successful trial as part of our count. So it always holds that for $X \sim Geom (p)$ , we have $X + 1 \sim FS (p)$ .
Note that the Geometric/First Success distributions have infinite supports, while the Binomial has a fixed number of trials. This is a quick way to tell them apart.

2.5 Negative Binomial

Story: Suppose we’re running independent Bernoulli trials with probability $p$ of success. We stop running trials after the $r^{t h}$ success. Let $X$ be the number of failed trials before the $r^{t h}$ success (not including any of the successes in that count). Then $X \sim NBin (r, p)$ .

Connections:

For independent and identically distributed $X_{1}, X_{2}, \dots, X_{r} \overset{i . i . d .}{\sim} Geom (p)$ , we get $\sum_{i = 1}^{r} X_{i} \sim NBin (r, p)$ .
- This means $NBin (1, p)$ is equivalent to $Geom (p)$ .

2.6 Poisson

Story: There’s no exact story to derive a Poisson. The only situation in which you’ll have to come up with the Poisson on your own is in approximation, and that is quite rare.

Approximate story: Say there are many rare events $A_{1}, A_{2}, \dots, A_{n}$ (so $n$ large and $P (A_{i}) = « 1$ , which stands for much smaller than $1$ ) which are nearly independent (which doesn’t have a rigorous definition). Then if we let $λ = \sum_{i = 1}^{n} P (A_{i})$ , $X = \sum_{i = 1}^{n} I (A_{i})$ is approximately distributed $Pois (λ)$ .

Connections:

As you can see in the approximate story, you can use the Poisson to count the number of independent/weakly-dependent rare events that occur.
Suppose $X \sim Pois (λ)$ and $Y \sim Pois (μ)$ with $X, Y$ independent. Then $X + Y \sim Pois (λ + μ)$ .
Chicken-Egg: suppose a chicken lays $N$ eggs, with $N \sim Pois (λ)$ . Suppose each egg has a probability $p$ of hatching, with each egg’s hatching being independent, and let $X$ be the number of eggs that hatch and $Y$ be the number of eggs that don’t hatch.
- $X$ and $Y$ are independent. $X$ and $Y$ are very conditionally independent given $N$ since $N = X + Y$ .
- $X \sim Pois (λ p)$ , $Y \sim Pois (λ (1 - p))$ .
- $X | N = n \sim Bin (n, p)$ .

3. Expectation

The expectation of a random variable

X

with support

A

is the weighted average of its possible values, where we weight based on the probability of

X

taking on each value in its support. It is formally defined as

E (X) = \sum_{x \in A} x P (X = x)

Linearity states that for any random variables $X, Y$ (which can be dependent!) and real number $c$ , $\begin{aligned} E (X + Y) & = E (X) + E (Y), \\ E (c X) & = c E (X) . \end{aligned}$ The law of the unconscious statistician (LOTUS) states that the expectation of any function of a random variable, $g (X)$ , can be found by $E (g (X)) = \sum_{x \in A} g (x) P (X = x) .$ For example, if we want to find $E (X^{2})$ , we simply swap $x^{2}$ in for $x$ in the expectation formula to get $E (X^{2}) = \sum_{x \in A} x^{2} P (X = x)$ . Note that the probabilities here don’t change, only what goes in front.

3.1 Indicator Random Variables

An indicator random variable converts an event into a Bernoulli random variable. For an event $A$ with $P (A) = p$ , the corresponding indicator random variable $I (A) \sim Bern (p)$ . This random variable is defined such that $I (A) = 1$ if $A$ occurs and $I (A) = 0$ if $A^{c}$ occurs. You might see other equivalent notation like $I_{A}$ or $I$ , just be clear about which event your indicator random variable corresponds to.

The fundamental bridge (vocab which is not used outside of Stat 110) gives that $E (I (A)) = P (A) .$ We use this result a lot to calculate expectations of random variables that can be expressed as the sum of indicators. This is nice because the indicators can be dependent, but linearity allows us to break the expectations apart! A very common workflow to calculate an expectation is to write

Write the random variable as the sum of indicators, $X = \sum_{i} I (A_{i})$ , where each $A_{i}$ is an event.
Apply linearity, $E (X) = \sum_{i} E (I (A_{i}))$ .
Use the fundamental bridge, $E (X) = \sum_{i} P (A_{i})$ .

3.2 Variance

The variance is a measure of spread, defined for a random variable

X

V a r (X) = E ((X - E (X))^{2}) .

Here basically all of the facts you have to know about variance:

It’s usually calculated using an equivalent formula, $V a r (X) = E (X^{2}) - (E (X))^{2} .$
It is always nonnegative. In fact, variance is only zero if $P (X = x) = 1$ for some $x \in R$ : in other words, $X$ takes on a certain value with probability $1$ . If this is not the case, the variance will be positive.
For a scalar $c \in R$ (a number, not random) and a random variable $X$ , $\begin{aligned} V a r (c X) & = c^{2} V a r (X) \\ V a r (X + c) & = V a r (X) . \end{aligned}$
For independent random variables $X$ and $Y$ $V a r (X + Y) = V a r (X) + V a r (Y) .$ For dependent random variables $X$ and $Y$ , $V a r (X + Y) \neq V a r (X) + V a r (Y) .$

4. Handy math facts

You are expected to know how to find the sum of an infinite geometric series: if $| x | < 1$ , $\sum_{n = 0}^{\infty} x^{n} = \frac{1}{1 - x} .$ otherwise the sum does not exist (it diverges). For finite geometric series (and any $x \neq 1$ ), $\sum_{n = 0}^{\infty} x^{n} = \frac{1 - x^{n}}{1 - x} .$
You are also expected to be familiar with some $e^{x}$ approximations, but you usually won’t be asked to approximate without prompting. The Taylor series of $e^{x}$ is $e^{x} = \sum_{n = 0}^{\infty} \frac{x^{n}}{n!} .$ The compound interest formula also gives $e^{x} = lim_{n \to \infty} (1 + \frac{x}{n})^{n} .$