Week 6: Universality of the Uniform, Normal, Expo, and Moments

0. Logistical Info

  • Section date: 10/25
  • Associated lectures: 10/17, 10/19
  • Associated pset: Pset 6, due 10/27
  • Office hours on 10/25 from 7-9pm at Quincy Dining Hall
  • Please reach out if you wanted to sign up for a midterm debrief and missed the chance
  • Remember to fill out the attendance form

0.1 Summary + Practice Problem PDFs

Summary + Practice Problems PDF

Practice Problem Solutions PDF

1. Universality of the Uniform

Recall that the standard uniform, UUnif(0,1), has support (0,1) with PDF 1 in the support.

Universality of the Uniform (UoU): If F is a valid CDF that is continuous and strictly increasing over the support, then

  1. Let UUnif(0,1). Then F1(U) is a random variable with CDF F.
  2. Let X have CDF F. Then F(X)Unif(0,1).

The first result applies to discrete random variables as well. The second result only works for continuous random variables.

This result is quite useful for simulation - if you have access to draws from a Uniform distribution, then you can transform them into draws from any distribution with a known (inverse) CDF.

We can prove UoU with the tools we’ve learned in class. For continuous random variables with F as described in the theorem,

  1. For xR, P(F1(U)<x)=P(F(F1(U))<F(x))=P(U<F(x))=F(x). So F1(U) has CDF F. We used the CDF of U in the last step, since F(x)[0,1].
  2. For u[0,1], P(F(X)<u)=P(F1(F(X))<F1(u))=P(X<F1(u))=F(F1(u))=u, so F(X)Unif(0,1) since it has the CDF of a standard uniform.

2. Normal distribution

2.1 Standard Normal

ZN(0,1) is a standard Normal random variable with support R. We notate the CDF as Φ and PDF as ϕ.

  • (Symmetry) The standard Normal is symmetric about 0. In math, for xR, ϕ(x)=ϕ(x).
    • This also implies that Φ(x)=1Φ(x).
    • So Φ(0)=0.5.
    • For ZN(0,1), ZN(0,1) as well.
  • (Empirical rule/68-95-99.7 rule) P(1<Z<1)0.68,P(2<Z<2)0.95,P(3<Z<3)0.997.

In this class, you can give exact answers in terms of Φ and ϕ. On psets, you should also use a calculator/programming language/the empirical rule to get numerical approximations of Φ.

2.2 Normal

XN(μ,σ2) (with μR,σ>0) is a Normal random variable with mean μ and variance σ2, and also has support R.

  • (Location-scale) For ZN(0,1), μ+σZN(μ,σ2).

    More generally, for XN(μ1,σ12), μ2+σ2XN(μ2+μ1σ2,σ12σ22).

  • (Standardization) For XN(μ,σ2), XμσN(0,1).\ We often use this to get results in terms of Φ: P(X<x)=P(Xμσ<xμσ)=Φ(xμσ).

  • (Empirical rule) For XN(μ,σ2), P(μσ<X<μ+σ)0.68P(μ2σ<X<μ+2σ)0.95P(μ3σ<X<μ+3σ)0.997

  • (Sum of independent Normals) Let XN(μ1,σ12) and YN(μ2,σ22) with X,Y independent. Then X+YN(μ1+μ2,σ12+σ22),XYN(μ1μ2,σ12+σ22).

(Variance when subtracting) See that we always add the variance above! This is also a general rule: for any independent random variables X and Y, Var(X+Y)=Var(XY)=Var(X)+Var(Y). See that this is consistent with the fact that Var(Y)=(1)2Var(Y)=Var(Y).

3. Exponential distribution

XExpo(λ) is an Exponential random variable with mean 1λ and variance 1λ2. λ is called the rate parameter.

  • (Memorylessness) For XExpo(λ) and any s,t>0, the memoryless property of the Exponential distribution states the following (equivalent) results: P(X>s+t|X>s)=P(X>t)(Xs|X>s)Expo(λ). See specifically that Xs|X>s is independent of the value of s.

    The Exponential distribution is the only continuous distribution with this property. Additionally, the Geometric distribution is the only discrete distribution with support 0,, that is memoryless.

For most results we talk about, you can’t put a random variable in the place of a constant - you might recall from last week’s problem set that we couldn’t let the sum of N independent Pois(λ) r.v.s, with N random, be distributed Pois(Nλ). However, with memorylessness, you can put random variables in the place of the s above - so for some random variable Y, P(X>t+Y|X>Y)=P(X>t) and (XY|X>Y)Expo(λ) still.

Click for proof

We can prove by using LOTP and applying the constant version of memorylessness. We’ll assume Y is discrete here, but continuous case is analogous (swap sums for integrals, PMFs for PDFs). P(X>t+Y|X>Y)=yP(X>t+y|X>Y,Y=y)P(Y=y)=yP(X>t+y|X>y,Y=y)P(Y=y). We’ll take a brief sidebar to show that P(X>t+y|X>y,Y=y)=P(X>t+y|X>y); I think you can jump from the former to the latter using unconditional independence of X and Y since the extra condition is a function of X, but we’ll be explicit here. We will use the definition of conditional probability, the fact that X>t+y implies that X>y, and the unconditional independence of X and Y. P(X>t+y|X>y,Y=y)=P(X>t+y,X>y,Y=y)P(X>y,Y=y)=P(X>t+y,Y=y)P(X>y,Y=y)=P(X>t+y)P(Y=y)P(X>y)P(Y=y)=P(X>t+y)P(X>y)=P(X>t+y,X>y)P(X>y)=P(X>t+y|X>y). With this information, P(X>t+Y|X>Y)=yP(X>t+y|X>y,Y=y)P(Y=y)=yP(X>t+y|X>y)P(Y=y)=yP(X>t)P(Y=y)=P(X>t)yP(Y=y)=P(X>t)(1)=P(X>t), where we use memorylessness to say P(X>t+y|X>y)=P(X>t).

  • (Example of Memorylessness) Suppose you’re waiting for a bus that will arrive in XExpo(λ) minutes. If you wait for the bus for 10 minutes and it has not arrived, then the remaining time that you have to wait is still distributed Expo(λ): X10|X>10Expo(λ). So no matter how long you wait, the remaining time for you to wait has the same distribution.
  • (Minimum of Expos) The minimum of n i.i.d. Expo(λ) random variables is distributed Expo(nλ). In notation, for X1,,Xni.i.d.Expo(λ), min(X1,,Xn)Expo(nλ).

Maximum of Expos

The maximum of n i.i.d. Exponential distributions is not does not follow an Exponential distribution.

Finding the distribution of minimums/maximums

The results above can be found in the book, but they provide a general template for finding the distributions of minimums and maximums.

Let X1,,Xn be any random variables. Then the events min(X1,,Xn)>x and (X1>x)(X2>x)(Xn>x) are equivalent. To convince yourself of this, think about what this means in words: the minimum of a set of numbers is greater than x if and only if each one of the numbers is great than x.

To find the CDF of min(X1,,Xn), a common workflow is P(min(X1,,Xn)x)=1P(min(X1,,Xn)>x)=1P(X1>x,X2>x,,Xn>x). If X1,,Xn are independent, then we can get that P(X1>x,X2>x,,Xn>x)=P(X1>x)P(X2>x)P(Xn>x) If X1,,Xn are also identically distributed, we conclude with P(X1>x)P(X2>x)P(Xn>x)=(P(X1>x))n.

For maximums, we follow a similar workflow, except instead using the fact that max(X1,,Xn)<x=i=1n(Xi<x).

4. Moments/Moment Generating Functions

For a random variable X, the nth moment is E(Xn).

Moment Generating Function

For a random variable X, the moment generating function (MGF) is MX(t)=E(etX) for tR. If the MGF exists, then MX(0)=1, dndtnMX(t)|t=0=MX(n)(t)=E(Xn). You should sanity-check that MX(0)=1 whenever you calculate an MGF.

Previous
Next