Week 6: Universality of the Uniform, Normal, Expo, and Moments

0. Logistical Info

Section date: 10/25
Associated lectures: 10/17, 10/19
Associated pset: Pset 6, due 10/27
Office hours on 10/25 from 7-9pm at Quincy Dining Hall
Please reach out if you wanted to sign up for a midterm debrief and missed the chance
Remember to fill out the attendance form

0.1 Summary + Practice Problem PDFs

1. Universality of the Uniform

Recall that the standard uniform, $U \sim Unif (0, 1)$ , has support $(0, 1)$ with PDF $1$ in the support.

Universality of the Uniform (UoU): If $F$ is a valid CDF that is continuous and strictly increasing over the support, then

Let $U \sim Unif (0, 1)$ . Then $F^{- 1} (U)$ is a random variable with CDF $F$ .
Let $X$ have CDF $F$ . Then $F (X) \sim Unif (0, 1)$ .

The first result applies to discrete random variables as well. The second result only works for continuous random variables.

This result is quite useful for simulation - if you have access to draws from a Uniform distribution, then you can transform them into draws from any distribution with a known (inverse) CDF.

We can prove UoU with the tools we’ve learned in class. For continuous random variables with $F$ as described in the theorem,

For $x \in R$ , $\begin{array}{r} P (F^{- 1} (U) < x) = P (F (F^{- 1} (U)) < F (x)) = P (U < F (x)) = F (x) . \end{array}$ So $F^{- 1} (U)$ has CDF $F$ . We used the CDF of $U$ in the last step, since $F (x) \in [0, 1]$ .
For $u \in [0, 1]$ , $\begin{aligned} P (F (X) < u) & = P (F^{- 1} (F (X)) < F^{- 1} (u)) \\ = P (X < F^{- 1} (u)) = F (F^{- 1} (u)) = u, \end{aligned}$ so $F (X) \sim Unif (0, 1)$ since it has the CDF of a standard uniform.

2. Normal distribution

2.1 Standard Normal

$Z \sim N (0, 1)$ is a standard Normal random variable with support $R$ . We notate the CDF as $Φ$ and PDF as $ϕ$ .

(Symmetry) The standard Normal is symmetric about $0$ . In math, for $x \in R$ , $ϕ (x) = ϕ (- x)$ .
- This also implies that $Φ (x) = 1 - Φ (- x)$ .
- So $Φ (0) = 0.5$ .
- For $Z \sim N (0, 1)$ , $- Z \sim N (0, 1)$ as well.
(Empirical rule/68-95-99.7 rule) $\begin{aligned} P (- 1 < Z < 1) & \approx 0.68, \\ P (- 2 < Z < 2) & \approx 0.95, \\ P (- 3 < Z < 3) & \approx 0.997 . \end{aligned}$

In this class, you can give exact answers in terms of $Φ$ and $ϕ$ . On psets, you should also use a calculator/programming language/the empirical rule to get numerical approximations of $Φ$ .

2.2 Normal

$X \sim N (μ, σ^{2})$ (with $μ \in R, σ > 0$ ) is a Normal random variable with mean $μ$ and variance $σ^{2}$ , and also has support $R$ .

(Location-scale) For $Z \sim N (0, 1)$ , $μ + σ Z \sim N (μ, σ^{2})$ .
More generally, for $X \sim N (μ_{1}, σ_{1}^{2})$ , $μ_{2} + σ_{2} X \sim N (μ_{2} + μ_{1} σ_{2}, σ_{1}^{2} σ_{2}^{2})$ .
(Standardization) For $X \sim N (μ, σ^{2})$ , $\frac{X - μ}{σ} \sim N (0, 1)$ .\ We often use this to get results in terms of $Φ$ : $\begin{array}{r} P (X < x) = P (\frac{X - μ}{σ} < \frac{x - μ}{σ}) = Φ (\frac{x - μ}{σ}) . \end{array}$
(Empirical rule) For $X \sim N (μ, σ^{2})$ , $\begin{aligned} P (μ - σ < X < μ + σ) & \approx 0.68 \\ P (μ - 2 σ < X < μ + 2 σ) & \approx 0.95 \\ P (μ - 3 σ < X < μ + 3 σ) & \approx 0.997 \end{aligned}$
(Sum of independent Normals) Let $X \sim N (μ_{1}, σ_{1}^{2})$ and $Y \sim N (μ_{2}, σ_{2}^{2})$ with $X, Y$ independent. Then $\begin{aligned} X + Y & \sim N (μ_{1} + μ_{2}, σ_{1}^{2} + σ_{2}^{2}), \\ X - Y & \sim N (μ_{1} - μ_{2}, σ_{1}^{2} + σ_{2}^{2}) . \end{aligned}$

(Variance when subtracting) See that we always add the variance above! This is also a general rule: for any independent random variables

X

and

Y

\begin{array}{r} V a r (X + Y) = V a r (X - Y) = V a r (X) + V a r (Y) . \end{array}

See that this is consistent with the fact that

V a r (- Y) = (- 1)^{2} V a r (Y) = V a r (Y)

3. Exponential distribution

$X \sim Expo (λ)$ is an Exponential random variable with mean $\frac{1}{λ}$ and variance $\frac{1}{λ^{2}}$ . $λ$ is called the rate parameter.

(Memorylessness) For $X \sim Expo (λ)$ and any $s, t > 0$ , the memoryless property of the Exponential distribution states the following (equivalent) results: $\begin{aligned} P (X > s + t | X > s) & = P (X > t) \\ (X - s | X > s) & \sim Expo (λ) . \end{aligned}$ See specifically that $X - s | X > s$ is independent of the value of $s$ .
The Exponential distribution is the only continuous distribution with this property. Additionally, the Geometric distribution is the only discrete distribution with support $0, \dots,$ that is memoryless.

For most results we talk about, you can’t put a random variable in the place of a constant - you might recall from last week’s problem set that we couldn’t let the sum of $N$ independent $Pois (λ)$ r.v.s, with $N$ random, be distributed $Pois (N λ)$ . However, with memorylessness, you can put random variables in the place of the $s$ above - so for some random variable $Y$ , $P (X > t + Y | X > Y) = P (X > t)$ and $(X - Y | X > Y) \sim Expo (λ)$ still.

Click for proof

We can prove by using LOTP and applying the constant version of memorylessness. We’ll assume $Y$ is discrete here, but continuous case is analogous (swap sums for integrals, PMFs for PDFs). $\begin{aligned} P (X > t + Y | X > Y) & = \sum_{y} P (X > t + y | X > Y, Y = y) P (Y = y) \\ = \sum_{y} P (X > t + y | X > y, Y = y) P (Y = y) . \end{aligned}$ We’ll take a brief sidebar to show that $P (X > t + y | X > y, Y = y) = P (X > t + y | X > y)$ ; I think you can jump from the former to the latter using unconditional independence of $X$ and $Y$ since the extra condition is a function of $X$ , but we’ll be explicit here. We will use the definition of conditional probability, the fact that $X > t + y$ implies that $X > y$ , and the unconditional independence of $X$ and $Y$ . $\begin{aligned} P (X > t + y | X > y, Y = y) & = \frac{P (X > t + y, X > y, Y = y)}{P (X > y, Y = y)} \\ = \frac{P (X > t + y, Y = y)}{P (X > y, Y = y)} \\ = \frac{P (X > t + y) P (Y = y)}{P (X > y) P (Y = y)} \\ = \frac{P (X > t + y)}{P (X > y)} \\ = \frac{P (X > t + y, X > y)}{P (X > y)} \\ = P (X > t + y | X > y) . \end{aligned}$ With this information, $\begin{aligned} P (X > t + Y | X > Y) & = \sum_{y} P (X > t + y | X > y, Y = y) P (Y = y) \\ = \sum_{y} P (X > t + y | X > y) P (Y = y) \\ = \sum_{y} P (X > t) P (Y = y) \\ = P (X > t) \sum_{y} P (Y = y) \\ = P (X > t) (1) = P (X > t), \end{aligned}$ where we use memorylessness to say $P (X > t + y | X > y) = P (X > t)$ .

(Example of Memorylessness) Suppose you’re waiting for a bus that will arrive in $X \sim Expo (λ)$ minutes. If you wait for the bus for 10 minutes and it has not arrived, then the remaining time that you have to wait is still distributed $Expo (λ)$ : $X - 10 | X > 10 \sim Expo (λ)$ . So no matter how long you wait, the remaining time for you to wait has the same distribution.
(Minimum of Expos) The minimum of $n$ i.i.d. $Expo (λ)$ random variables is distributed $Expo (n λ)$ . In notation, for $X_{1}, \dots, X_{n} \overset{i . i . d .}{\sim} Expo (λ)$ , $min (X_{1}, \dots, X_{n}) \sim Expo (n λ)$ .

Maximum of Expos

The maximum of $n$ i.i.d. Exponential distributions is not does not follow an Exponential distribution.

Finding the distribution of minimums/maximums

The results above can be found in the book, but they provide a general template for finding the distributions of minimums and maximums.

Let $X_{1}, \dots, X_{n}$ be any random variables. Then the events $min (X_{1}, \dots, X_{n}) > x$ and $(X_{1} > x) \cap (X_{2} > x) \cap \dots \cap (X_{n} > x)$ are equivalent. To convince yourself of this, think about what this means in words: the minimum of a set of numbers is greater than $x$ if and only if each one of the numbers is great than $x$ .

To find the CDF of $min (X_{1}, \dots, X_{n})$ , a common workflow is $\begin{aligned} P (min (X_{1}, \dots, X_{n}) \leq x) & = 1 - P (min (X_{1}, \dots, X_{n}) > x) = 1 - P (X_{1} > x, X_{2} > x, \dots, X_{n} > x) . \end{aligned}$ If $X_{1}, \dots, X_{n}$ are independent, then we can get that $\begin{aligned} P (X_{1} > x, X_{2} > x, \dots, X_{n} > x) & = P (X_{1} > x) P (X_{2} > x) \dots P (X_{n} > x) \end{aligned}$ If $X_{1}, \dots, X_{n}$ are also identically distributed, we conclude with $\begin{aligned} P (X_{1} > x) P (X_{2} > x) \dots P (X_{n} > x) & = (P (X_{1} > x))^{n} . \end{aligned}$

For maximums, we follow a similar workflow, except instead using the fact that $max (X_{1}, \dots, X_{n}) < x = ⋂_{i = 1}^{n} (X_{i} < x) .$

4. Moments/Moment Generating Functions

For a random variable $X$ , the $n^{th}$ moment is $E (X^{n})$ .

Moment Generating Function

For a random variable $X$ , the moment generating function (MGF) is $M_{X} (t) = E (e^{t X})$ for $t \in R$ . If the MGF exists, then $\begin{aligned} M_{X} (0) & = 1, \frac{d^{n}}{d t^{n}} M_{X} (t) |_{t = 0} = M_{X}^{(n)} (t) & = E (X^{n}) . \end{aligned}$ You should sanity-check that $M_{X} (0) = 1$ whenever you calculate an MGF.