0. Logistical Info
- Section date: 10/4
- Associated lectures: 9/24, 9/26, 10/3
- Associated pset: Pset 4, due 10/6
- Midterm: 10/10
- Office hours on 10/4 from 7-9pm at Quincy Dining Hall
- Exam office hours on 10/7 and 10/9 from 8-10pm at Quincy Dining Hall
- Remember to fill out the attendance form
- Given the structure of my section, I’m shifting away from a lot of explanation on this webpage. I may come back in the future and add more examples, but it doesn’t make much sense since we’re mainly doing practice problems in section. So this week, I have no concise summary section on the webpage because it’s all pretty tight - check out the handout below if you want it.
0.1 Summary + Practice Problem PDFs
Summary + Practice Problems PDF
Practice Problem Solutions PDF
1. Random variables
Let’s use the precise mathematical definition from last time: random variables assign real numbers to possible outcomes of an experiment. In other words, they map the sample space to the real line. So for a random variable
Here’s the terminology of a random variable that we’ve talked about thus far, where we continue using
- The support: what is the set of values that a random variable can take on? This is equivalent to the image/range of of
on , . - For a named distribution like a Binomial, we say
is distributed Binomial using , where we have to set possible values of the parameters and for our specific problem. You CANNOT set : named distributions cannot equal random variables, they are just a blueprint for what the random variable looks like. - The probability mass function (PMF): for any real number
(or or , it’s just a filler variable), what is the probability that takes on this value? This is notated .- You should address every possible value of
: if is not in the support of , , and every probability should be valid (nonnegative, between and inclusive).
- You should address every possible value of
- NEW: the cumulative density function (CDF): for any real number
, what is the probability that takes on a value that is less than or equal to ? This is notated .- You should again address every possible value of
, both in and outside of the support. - Here, the requirements for a valid CDF are that
if is less than the smallest value in the support and if is greater than the biggest value in the support. For an infinite support, we should have as and as . - Additionally, a CDF should be non-decreasing (i.e., either increasing or a flat line).
- You should again address every possible value of
- We often abbreviate to say random variables are independent and identically distributed (i.i.d.).
Here’s a general approach for defining the distribution of a random variable (r.v.). You can give the distribution using either the PMF, the CDF, or a named distribution with the parameters defined.
- Define the support of your r.v.
- See if the random variable matches the story of any of the named distributions we have discussed. To see if an r.v. matches a distribution, some things to check are
- For which named distributions is the support of your r.v. possible?
- Are there draws/samples/trials? If so, are they independent?
- If there is sampling, is it done with or without replacement?
- If you can match a named distribution, what are the parameters? Are those parameters allowed for that named distribution?
- If you can’t match a named distribution, how can you calculate the PMF using the information you checked about sampling and your counting skills?
2. Discrete distributions
You can find details like the support, PMF, CDF, expectation, and variance in the table of distributions on page 605 of the textbook or page 3 of the midterm handout. We’ll focus on the stories and connections between distributions. For these discrete random variables (except for the Poisson), you should develop comfort with calculating their PMFs from scratch.
2.1 Bernoulli
Story: We run a trial with probability
Connections:
- For
, . - For
, , so . If you’re wondering why, check the support!
2.2 Binomial
Story: We run
Connections:
- For
independent and identically distributed Bernoulli random variables ,- This means
is equivalent to .
- This means
- For independent random variables
and ,
2.3 Hypergeometric
Story:
- Capture/recapture elk: There are
elk in the forest. In the past, we captured and tagged of the elk. We now recapture of the elk, where every set of is equally likely and elk are sampled without replacement. Let be the number of tagged elk among our recaptured elk. Then . - White and black balls in an urn: There are
white balls and black balls in a urn. We draw balls from the urn without replacement, where each set of balls is equally likely to be drawn. Let be the number of white balls in our sample. Then .
Connections:
- Notice the comparison between the Binomial and the Hypergeometric: using the urn story, if we sampled with replacement our random variable would be distributed
.
2.4 Geometric/First Success
Story: Suppose we’re running independent Bernoulli trials with probability
Connections:
- The First Success distribution is essentially the same as the Geometric, but we include the first successful trial as part of our count. So it always holds that for
, we have . - Note that the Geometric/First Success distributions have infinite supports, while the Binomial has a fixed number of trials. This is a quick way to tell them apart.
2.5 Negative Binomial
Story: Suppose we’re running independent Bernoulli trials with probability
Connections:
- For independent and identically distributed
, we get .- This means
is equivalent to .
- This means
2.6 Poisson
Story: There’s no exact story to derive a Poisson. The only situation in which you’ll have to come up with the Poisson on your own is in approximation, and that is quite rare.
Approximate story: Say there are many rare events
Connections:
- As you can see in the approximate story, you can use the Poisson to count the number of independent/weakly-dependent rare events that occur.
- Suppose
and with independent. Then . - Chicken-Egg: suppose a chicken lays
eggs, with . Suppose each egg has a probability of hatching, with each egg’s hatching being independent, and let be the number of eggs that hatch and be the number of eggs that don’t hatch. and are independent. and are very conditionally independent given since . , . .
3. Expectation
Linearity states that for any random variables
3.1 Indicator Random Variables
An indicator random variable converts an event into a Bernoulli random variable. For an event
The fundamental bridge (vocab which is not used outside of Stat 110) gives that
- Write the random variable as the sum of indicators,
, where each is an event. - Apply linearity,
. - Use the fundamental bridge,
.
3.2 Variance
Here basically all of the facts you have to know about variance:
- It’s usually calculated using an equivalent formula,
- It is always nonnegative. In fact, variance is only zero if
for some : in other words, takes on a certain value with probability . If this is not the case, the variance will be positive. - For a scalar
(a number, not random) and a random variable , - For independent random variables
and For dependent random variables and ,
4. Handy math facts
- You are expected to know how to find the sum of an infinite geometric series: if
, otherwise the sum does not exist (it diverges). For finite geometric series (and any ), - You are also expected to be familiar with some
approximations, but you usually won’t be asked to approximate without prompting. The Taylor series of is The compound interest formula also gives