Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Complete Probability Cheatsheet, Cheat Sheet of Probability and Statistics

Useful and complete cheat sheet for the exam of Probability and Statistics with formulas

Typology: Cheat Sheet

2019/2020
On special offer
30 Points
Discount

Limited-time offer


Uploaded on 11/27/2020

lalitdiya
lalitdiya 🇺🇸

4.3

(23)

10 documents

Partial preview of the text

Download Complete Probability Cheatsheet and more Cheat Sheet Probability and Statistics in PDF only on Docsity! Probability Cheatsheet v1.1.1 Counting Multiplication Rule - Let’s say we have a compound experiment (an experiment with multiple components). If the 1st component has n1 possible outcomes, the 2nd component has n2 possible outcomes, and the rth component has nr possible outcomes, then overall there are n1n2 . . . nr possibilities for the whole experiment. Sampling Table - The sampling tables describes the different ways to take a sample of size k out of a population of size n. The column names denote whether order matters or not. Matters Not Matter With Replacement n k (n+ k − 1 k ) Without Replacement n! (n− k)! (n k ) Näıve Definition of Probability - If the likelihood of each outcome is equal, the probability of any event happening is: P (Event) = number of favorable outcomes number of outcomes Probability and Thinking Conditionally Independence Independent Events - A and B are independent if knowing one gives you no information about the other. A and B are independent if and only if one of the following equivalent statements hold: P (A ∩B) = P (A)P (B) P (A|B) = P (A) Conditional Independence - A and B are conditionally independent given C if: P (A ∩B|C) = P (A|C)P (B|C). Conditional independence does not imply independence, and independence does not imply conditional independence. Unions, Intersections, and Complements De Morgan’s Laws - Gives a useful relation that can make calculating probabilities of unions easier by relating them to intersections, and vice versa. De Morgan’s Law says that the complement is distributive as long as you flip the sign in the middle. (A ∪B)c ≡ Ac ∩Bc (A ∩B)c ≡ Ac ∪Bc Joint, Marginal, and Conditional Probabilities Joint Probability - P (A ∩B) or P (A,B) - Probability of A and B. Marginal (Unconditional) Probability - P (A) - Probability of A Conditional Probability - P (A|B) - Probability of A given B occurred. Conditional Probability is Probability - P (A|B) is a probability as well, restricting the sample space to B instead of Ω. Any theorem that holds for probability also holds for conditional probability. Simpson’s Paradox P (A | B,C) < P (A | Bc, C) and P (A | B,Cc) < P (A | Bc, Cc) yet still, P (A | B) > P (A | Bc) Bayes’ Rule and Law of Total Probability Law of Total Probability with partitioning set B1,B2,B3, ...Bn and with extra conditioning (just add C!) P (A) = P (A|B1)P (B1) + P (A|B2)P (B2) + ...P (A|Bn)P (Bn) P (A) = P (A ∩B1) + P (A ∩B2) + ...P (A ∩Bn) P (A|C) = P (A|B1,C)P (B1|C) + ...P (A|Bn,C)P (Bn|C) P (A|C) = P (A ∩B1|C) + P (A ∩B2|C) + ...P (A ∩Bn|C) Law of Total Probability with B and Bc (special case of a partitioning set), and with extra conditioning (just add C!) P (A) = P (A|B)P (B) + P (A|Bc)P (Bc) P (A) = P (A ∩B) + P (A ∩Bc) P (A|C) = P (A|B,C)P (B|C) + P (A|Bc,C)P (Bc|C) P (A|C) = P (A ∩B|C) + P (A ∩Bc|C) Bayes’ Rule, and with extra conditioning (just add C!) P (A|B) = P (A ∩B) P (B) = P (B|A)P (A) P (B) P (A|B,C) = P (A ∩B|C) P (B|C) = P (B|A,C)P (A|C) P (B|C) Odds Form of Bayes’ Rule, and with extra conditioning (just add C!) P (A|B) P (Ac|B) = P (B|A) P (B|Ac) P (A) P (Ac) P (A|B,C) P (Ac|B,C) = P (B|A,C) P (B|Ac,C) P (A|C) P (Ac|C) Random Variables and their Distributions PMF, CDF, and Independence Probability Mass Function (PMF) (Discrete Only) gives the probability that a random variable takes on the value X. PX(x) = P (X = x) Cumulative Distribution Function (CDF) gives the probability that a random variable takes on the value x or less FX(x0) = P (X ≤ x0) Independence - Intuitively, two random variables are independent if knowing one gives you no information about the other. X and Y are independent if for ALL values of x and y: P (X = x, Y = y) = P (X = x)P (Y = y) Expected Value and Indicators Distributions Probability Mass Function (PMF) (Discrete Only) is a function that takes in the value x, and gives the probability that a random variable takes on the value x. The PMF is a positive-valued function, and ∑ x P (X = x) = 1 PX(x) = P (X = x) Cumulative Distribution Function (CDF) is a function that takes in the value x, and gives the probability that a random variable takes on the value at most x. F (x) = P (X ≤ x) Expected Value, Linearity, and Symmetry Expected Value (aka mean, expectation, or average) can be thought of as the “weighted average” of the possible outcomes of our random variable. Mathematically, if x1, x2, x3, . . . are all of the possible values that X can take, the expected value of X can be calculated as follows: E(X) = ∑ i xiP (X = xi) Note that for any X and Y , a and b scaling coefficients and c is our constant, the following property of Linearity of Expectation holds: E(aX + bY + c) = aE(X) + bE(Y ) + c If two Random Variables have the same distribution, even when they are dependent by the property of Symmetry their expected values are equal. Conditional Expected Value is calculated like expectation, only conditioned on any event A. E(X|A) = ∑ x xP (X = x|A) Indicator Random Variables Indicator Random Variables is random variable that takes on either 1 or 0. The indicator is always an indicator of some event. If the event occurs, the indicator is 1, otherwise it is 0. They are useful for many problems that involve counting and expected value. Distribution IA ∼ Bern(p) where p = P (A) Fundamental Bridge The expectation of an indicator for A is the probability of the event. E(IA) = P (A). Notation: IA = { 1 A occurs 0 A does not occur Variance Var(X) = E(X 2 )− [E(X)]2 Expectation and Independence If X and Y are independent, then E(XY ) = E(X)E(Y ) Continuous RVs, LotUS, and UoU Continuous Random Variables What’s the prob that a CRV is in an interval? Use the CDF (or the PDF, see below). To find the probability that a CRV takes on a value in the interval [a, b], subtract the respective CDFs. P (a ≤ X ≤ b) = P (X ≤ b)− P (X ≤ a) = F (b)− F (a) Note that for an r.v. with a normal distribution, P (a ≤ X ≤ b) = P (X ≤ b)− P (X ≤ a) = Φ ( b− µ σ2 ) − Φ ( a− µ σ2 ) What is the Cumulative Density Function (CDF)? It is the following function of x. F (x) = P (X ≤ x) What is the Probability Density Function (PDF)? The PDF, f(x), is the derivative of the CDF. F ′ (x) = f(x) Or alternatively, F (x) = ∫ x −∞ f(t)dt Note that by the fundamental theorem of calculus, F (b)− F (a) = ∫ b a f(x)dx Thus to find the probability that a CRV takes on a value in an interval, you can integrate the PDF, thus finding the area under the density curve. How do I find the expected value of a CRV? Where in discrete cases you sum over the probabilities, in continuous cases you integrate over the densities. E(X) = ∫ ∞ −∞ xf(x)dx Law of the Unconscious Statistician (LotUS) Expected Value of Function of RV Normally, you would find the expected value of X this way: E(X) = ΣxxP (X = x) E(X) = ∫ ∞ −∞ xf(x)dx LotUS states that you can find the expected value of a function of a random variable g(X) this way: E(g(X)) = Σxg(x)P (X = x) E(g(X)) = ∫ ∞ −∞ g(x)f(x)dx What’s a function of a random variable? A function of a random variable is also a random variable. For example, if X is the number of bikes you see in an hour, then g(X) = 2X could be the number of bike wheels you see in an hour. Both are random variables. What’s the point? You don’t need to know the PDF/PMF of g(X) to find its expected value. All you need is the PDF/PMF of X. Universality of Uniform When you plug any random variable into its own CDF, you get a Uniform[0,1] random variable. When you put a Uniform[0,1] into an inverse CDF, you get the corresponding random variable. For example, let’s say that a random variable X has a CDF F (x) = 1− e−x By the Universality of the the Uniform, if we plug in X into this function then we get a uniformly distributed random variable. F (X) = 1− e−X ∼ U Similarly, since F (X) ∼ U then X ∼ F−1(U). The key point is that for any continuous random variable X, we can transform it into a uniform random variable and back by using its CDF. Moment Generating Functions (MGFs) Moments Moments describe the shape of a distribution. The kth moment of a random variable X is µ ′ k = E(X k ) The mean, variance, and skewness of a distribution can be expressed by its moments. Specifically: Mean E(X) = µ′1 Variance Var(X) = E(X2)− E(X)2 = µ′2 − (µ ′ 1) 2 Moment Generating Functions MGF For any random variable X, this expected value and function of dummy variable t; MX(t) = E(e tX ) is the moment generating function (MGF) of X if it exists for a finitely-sized interval centered around 0. Note that the MGF is just a function of a dummy variable t. Why is it called the Moment Generating Function? Because the kth derivative of the moment generating function evaluated 0 is the kth moment of X! µ ′ k = E(X k ) = M (k) X (0) This is true by Taylor Expansion of etX MX(t) = E(e tX ) = ∞∑ k=0 E(Xk)tk k! = ∞∑ k=0 µ′kt k k! Or by differentiation under the integral sign and then plugging in t = 0 M (k) X (t) = dk dtk E(e tX ) = E ( dk dtk e tX ) = E(X k e tX ) M (k) X (0) = E(X k e 0X ) = E(X k ) = µ ′ k MGF of linear combinations If we have Y = aX + c, then MY (t) = E(e t(aX+c) ) = e ct E(e (at)X ) = e ct MX(at) Uniqueness of the MGF. If it exists, the MGF uniquely defines the distribution. This means that for any two random variables X and Y , they are distributed the same (their CDFs/PDFs are equal) if and only if their MGF’s are equal. You can’t have different PDFs when you have two random variables that have the same MGF. Summing Independent R.V.s by Multiplying MGFs. If X and Y are independent, then M(X+Y )(t) = E(e t(X+Y ) ) = E(e tX )E(e tY ) = MX(t) ·MY (t) M(X+Y )(t) = MX(t) ·MY (t) The MGF of the sum of two random variables is the product of the MGFs of those two random variables. Joint PDFs and CDFs Joint Distributions Review: Joint Probability of events A and B: P (A ∩ B) Both the Joint PMF and Joint PDF must be non-negative and sum/integrate to 1. ( ∑ x ∑ y P (X = x, Y = y) = 1) ( ∫ x ∫ y fX,Y (x, y) = 1). Like in the univariate cause, you sum/integrate the PMF/PDF to get the CDF. Conditional Distributions Review: By Baye’s Rule, P (A|B) = P (B|A)P (A) P (B) Similar conditions apply to conditional distributions of random variables. For discrete random variables: P (Y = y|X = x) = P (X = x, Y = y) P (X = x) = P (X = x|Y = y)P (Y = y) P (X = x) For continuous random variables: fY |X(y|x) = fX,Y (x, y) fX(x) = fX|Y (x|y)fY (y) fX(x) Hybrid Bayes’ Rule f(x|A) = P (A|X = x)f(x) P (A) Marginal Distributions Review: Law of Total Probability Says for an event A and partition B1, B2, ...Bn: P (A) = ∑ i P (A ∩ Bi) To find the distribution of one (or more) random variables from a joint distribution, sum or integrate over the irrelevant random variables. Getting the Marginal PMF from the Joint PMF P (X = x) = ∑ y P (X = x, Y = y) Getting the Marginal PDF from the Joint PDF fX(x) = ∫ y fX,Y (x, y)dy Independence of Random Variables Review: A and B are independent if and only if either P (A ∩ B) = P (A)P (B) or P (A|B) = P (A). Similar conditions apply to determine whether random variables are independent - two random variables are independent if their joint distribution function is simply the product of their marginal distributions, or that the a conditional distribution of is the same as its marginal distribution. In words, random variables X and Y are independent for all x, y, if and only if one of the following hold: • Joint PMF/PDF/CDFs are the product of the Marginal PMF • Conditional distribution of X given Y is the same as the marginal distribution of X Multivariate LotUS Review: E(g(X)) = ∑ x g(x)P (X = x), or E(g(X)) = ∫∞ −∞ g(x)fX(x)dx For discrete random variables: E(g(X,Y )) = ∑ x ∑ y g(x, y)P (X = x, Y = y) For continuous random variables: E(g(X,Y )) = ∫ ∞ −∞ ∫ ∞ −∞ g(x, y)fX,Y (x, y)dxdy Covariance and Transformations Covariance and Correlation Covariance is the two-random-variable equivalent of Variance, defined by the following: Cov(X,Y ) = E[(X − E(X))(Y − E(Y ))] = E(XY )− E(X)E(Y ) Note that Cov(X,X) = E(XX)− E(X)E(X) = Var(X) Correlation is a rescaled variant of Covariance that is always between -1 and 1. Corr(X,Y ) = Cov(X,Y )√ Var(X)Var(Y ) = Cov(X,Y ) σXσY Covariance and Indepedence - If two random variables are independent, then they are uncorrelated. The inverse is not necessarily true. X ⊥ Y −→ Cov(X,Y ) = 0 X ⊥ Y −→ E(XY ) = E(X)E(Y ) Covariance and Variance - Note that Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X,Y ) Var(X1 +X2 + · · ·+Xn) = n∑ i=1 Var(Xi) + 2 ∑ i<j Cov(Xi, Xj) In particular, if X and Y are independent then they have covariance 0 thus X ⊥ Y =⇒ Var(X + Y ) = Var(X) + Var(Y ) In particular, If X1, X2, . . . , Xn are identically distributed have the same covariance relationships, then Var(X1 +X2 + · · ·+Xn) = nVar(X1) + 2 (n 2 ) Cov(X1, X2) Covariance and Linearity - For random variables W,X, Y, Z and constants a, b: Cov(X,Y ) = Cov(Y,X) Cov(X + a, Y + b) = Cov(X,Y ) Cov(aX, bY ) = abCov(X,Y ) Cov(W +X,Y + Z) = Cov(W,Y ) + Cov(W,Z) + Cov(X,Y ) + Cov(X,Z) Binomial Let us say that X is distributed Bin(n, p). We know the following: Story X is the number of ”successes” that we will achieve in n independent trials, where each trial can be either a success or a failure, each with the same probability p of success. We can also say that X is a sum of multiple independent Bern(p) random variables. Let X ∼ Bin(n, p) and Xj ∼ Bern(p), where all of the Bernoullis are independent. We can express the following: X = X1 +X2 +X3 + · · ·+Xn Example If Jeremy Lin makes 10 free throws and each one independently has a 34 chance of getting in, then the number of free throws he makes is distributed Bin(10, 34 ), or, letting X be the number of free throws that he makes, X is a Binomial Random Variable distributed Bin(10, 34 ). Binomial Coefficient (n k ) is a function of n and k and is read n choose k, and means out of n possible indistinguishable objects, how many ways can I possibly choose k of them? The formula for the binomial coefficient is: (n k ) = n! k!(n− k)! Geometric Let us say that X is distributed Geom(p). We know the following: Story X is the number of “failures” that we will achieve before we achieve our first success. Our successes have probability p. Example If each pokeball we throw has a 110 probability to catch Mew, the number of failed pokeballs will be distributed Geom( 110 ). First Success Equivalent to the geometric distribution, except it counts the total number of “draws” until the first success. This is 1 more than the number of failures. If X ∼ FS(p) then E(X) = 1/p. Negative Binomial Let us say that X is distributed NBin(r, p). We know the following: Story X is the number of “failures” that we will achieve before we achieve our rth success. Our successes have probability p. Example Thundershock has 60% accuracy and can faint a wild Raticate in 3 hits. The number of misses before Pikachu faints Raticate with Thundershock is distributed NBin(3, .6). Hypergeometric Let us say that X is distributed HGeom(w, b, n). We know the following: Story In a population of b undesired objects and w desired objects, X is the number of “successes” we will have in a draw of n objects, without replacement. Example 1) Let’s say that we have only b Weedles (failure) and w Pikachus (success) in Viridian Forest. We encounter n Pokemon in the forest, and X is the number of Pikachus in our encounters. 2) The number of aces that you draw in 5 cards (without replacement). 3) You have w white balls and b black balls, and you draw b balls. You will draw X white balls. 4) Elk Problem - You have N elk, you capture n of them, tag them, and release them. Then you recollect a new sample of size m. How many tagged elk are now in the new sample? PMF The probability mass function of a Hypergeometric: P (X = k) = (w k )( b n−k )(w+b n ) Poisson Let us say that X is distributed Pois(λ). We know the following: Story There are rare events (low probability events) that occur many different ways (high possibilities of occurences) at an average rate of λ occurrences per unit space or time. The number of events that occur in that unit of space or time is X. Example A certain busy intersection has an average of 2 accidents per month. Since an accident is a low probability event that can happen many different ways, the number of accidents in a month at that intersection is distributed Pois(2). The number of accidents that happen in two months at that intersection is distributed Pois(4) Multivariate Distributions Multinomial Let us say that the vector ~X = (X1, X2, X3, . . . , Xk) ∼ Multk(n, ~p) where ~p = (p1, p2, . . . , pk). Story - We have n items, and then can fall into any one of the k buckets independently with the probabilities ~p = (p1, p2, . . . , pk). Example - Let us assume that every year, 100 students in the Harry Potter Universe are randomly and independently sorted into one of four houses with equal probability. The number of people in each of the houses is distributed Mult4(100, ~p), where ~p = (.25, .25, .25, .25). Note that X1 +X2 + · · ·+X4 = 100, and they are dependent. Multinomial Coefficient The number of permutations of n objects where you have n1, n2, n3 . . . , nk of each of the different variants is the multinomial coefficient.( n n1n2 . . . nk ) = n! n1!n2! . . . nk! Joint PMF - For n = n1 + n2 + · · ·+ nk P ( ~X = ~n) = ( n n1n2 . . . nk ) p n1 1 p n2 2 . . . p nk k Lumping - If you lump together multiple categories in a multinomial, then it is still multinomial. A multinomial with two dimensions (success, failure) is a binomial distribution. Variances and Covariances - For (X1, X2, . . . , Xk) ∼ Multk(n, (p1, p2, . . . , pk)), we have that marginally Xi ∼ Bin(n, pi) and hence Var(Xi) = npi(1− pi). Also, for i 6= j, Cov(Xi, Xj) = −npipj , which is a result from class. Marginal PMF and Lumping Xi ∼ Bin(n, pi) Xi +Xj ∼ Bin(n, pi + pj) X1,X2,X3∼Mult3(n,(p1,p2,p3))→X1,X2+X3∼Mult2(n,(p1,p2+p3)) X1, . . . , Xk−1|Xk = nk ∼ Multk−1 ( n− nk, ( p1 1− pk , . . . , pk−1 1− pk )) Multivariate Uniform See the univariate uniform for stories and examples. For multivariate uniforms, all you need to know is that probability is proportional to volume. More formally, probability is the volume of the region of interest divided by the total volume of the support. Every point in the support has equal density of value 1Total Area . Multivariate Normal (MVN) A vector ~X = (X1, X2, X3, . . . , Xk) is declared Multivariate Normal if any linear combination is normally distributed (e.g. t1X1 + t2X2 + · · ·+ tkXk is Normal for any constants t1, t2, . . . , tk). The parameters of the Multivariate normal are the mean vector ~µ = (µ1, µ2, . . . , µk) and the covariance matrix where the (i, j) th entry is Cov(Xi, Xj). For any MVN distribution: 1) Any sub-vector is also MVN. 2) If any two elements of a multivariate normal distribution are uncorrelated, then they are independent. Note that 2) does not apply to most random variables. Distribution Properties Important CDFs Exponential F (X) = 1− e−λx, x ∈ (0,∞)) Uniform(0, 1) F (X) = x, x ∈ (0, 1) Poisson Properties (Chicken and Egg Results) We have X ∼ Pois(λ1) and Y ∼ Pois(λ2) and X ⊥ Y . 1. X + Y ∼ Pois(λ1 + λ2) 2. X|(X + Y = k) ∼ Bin ( k, λ1 λ1+λ2 ) 3. If we have that Z ∼ Pois(λ), and we randomly and independently “accept” every item in Z with probability p, then the number of accepted items Z1 ∼ Pois(λp), and the number of rejected items Z2 ∼ Pois(λq), and Z1 ⊥ Z2. Convolutions of Random Variables A convolution of n random variables is simply their sum. 1. X ∼ Pois(λ1), Y ∼ Pois(λ2), X ⊥ Y −→ X + Y ∼ Pois(λ1 + λ2) 2. X ∼ Bin(n1, p), Y ∼ Bin(n2, p), X ⊥ Y −→ X + Y ∼ Bin(n1 + n2, p) Note that Binomial can thus be thought of as a sum of iid Bernoullis. 3. X ∼ Gamma(n1, λ), Y ∼ Gamma(n2, λ), X ⊥ Y −→ X + Y ∼ Gamma(n1 + n2, λ) Note that Gamma can thus be thought of as a sum of iid Expos. 4. X ∼ NBin(r1, p), Y ∼ NBin(r2, p), X ⊥ Y −→ X + Y ∼ NBin(r1 + r2, p) 5. All of the above are approximately normal when λ, n, r are large by the Central Limit Theorem. 6. Z1 ∼ N (µ1, σ21), Z2 ∼ N (µ2, σ 2 2), Z1 ⊥ Z2 −→ Z1 + Z2 ∼ N (µ1 + µ2, σ21 + σ 2 2) Special Cases of Random Variables 1. Bin(1, p) ∼ Bern(p) 2. Beta(1, 1) ∼ Unif(0, 1) 3. Gamma(1, λ) ∼ Expo(λ) 4. χ2n ∼ Gamma ( n 2 , 1 2 ) 5. NBin(1, p) ∼ Geom(p) Reasoning by Representation Beta-Gamma relationship If X ∼ Gamma(a, λ), Y ∼ Gamma(b, λ), X ⊥ Y then • XX+Y ∼ Beta(a, b) • X + Y ⊥ XX+Y This is also known as the bank-post office result. Binomial-Poisson Relationship Bin(n, p)→ Pois(λ) as n→∞, p→ 0, np = λ. Order Statistics of Uniform U(j) ∼ Beta(j, n− j + 1) Universality of Uniform For any X with CDF F (x), F (X) ∼ U Formulas In general, remember that PDFs integrated (and PMFs summed) over support equal 1. Geometric Series a+ ar + ar 2 + · · ·+ arn−1 = n−1∑ k=0 ar k = a 1− rn 1− r Exponential Function (ex) e x = ∞∑ n=1 xn n! = 1 + x+ x2 2! + x3 3! + · · · = lim n→∞ ( 1 + x n )n Gamma and Beta Distributions You can sometimes solve complicated-looking integrals by pattern-matching to the following:∫ ∞ 0 x t−1 e −x dx = Γ(t) ∫ 1 0 x a−1 (1− x)b−1 dx = Γ(a)Γ(b) Γ(a+ b) Where Γ(n) = (n− 1)! if n is a positive integer Bayes’ Billiards (special case of Beta)∫ 1 0 x k (1− x)n−k dx = 1 (n+ 1) (n k ) Euler’s Approximation for Harmonic Sums 1 + 1 2 + 1 3 + · · ·+ 1 n ≈ logn+ 0.57721 . . . Stirling’s Approximation n! ∼ √ 2πn ( n e )n Miscellaneous Definitions Medians A continuous random variable X has median m if P (X ≤ m) = 50% A discrete random variable X has median m if P (X ≤ m) ≥ 50% and P (X ≥ m) ≥ 50% Log Statisticians generally use log to refer to ln i.i.d random variables Independent, identically-distributed random variables. Example Problems Contributions from Sebastian Chiu Calculating Probability (1) A textbook has n typos, which are randomly scattered amongst its n pages. You pick a random page, what is the probability that it has no typos? Answer - There is a ( 1− 1n ) probability that any specific typo isn’t on your page, and thus a ( 1− 1n )n probability that there are no typos on your page. For n large, this is approximately e−1 = 1/e by a definition of ex. Calculating Probability (2) In a group of n people, what is the expected number of distinct birthdays (month and day). What is the expected number of birthday matches? Answer - Let X be the number of distinct birthdays, and let Ij be the indicator for whether the j th days is represented. E(Ij) = 1− P (no one born day j) = 1− (364/365)n By linearity, E(X) = 365 (1− (364/365)n) . Now let Y be the number of birthday matches and let Ji be the indicator that the i th pair of people have the same birthday. The probability that any two people share a birthday is 1/365 so E(Y ) = (n 2 ) /365 . Linearity of Expectation This problem is commonly known as the hat-matching problem. n people have n hats each. At the end of the party, they each leave with a random hat. What is the expected number of people that leave with the right hat? Answer - Each hat has a 1/n chance of going to the right person. By linearity of expectation, the average number of hats that go to their owners is n(1/n) = 1 . First Success and Linearity of Expectation This problem is commonly known as the coupon collector problem. There are n total coupons, and each draw, you get a random coupon. What is the expected number of coupons needed until you have a complete set? Answer - Let N be the number of coupons needed; we want E(N). Let N = N1 + · · ·+Nn, N1 is the draws to draw our first distinct coupon, N2 is the additional draws needed to draw our second distinct coupon and so on. By the story of First Success, N2 ∼ FS((n− 1)/n) (after collecting first coupon type, there’s (n− 1)/n chance you’ll get something new). Similarly, N3 ∼ FS((n− 2)/n), and Nj ∼ FS((n− j + 1)/n). By linearity, E(N) = E(N1) + · · ·+ E(Nn) = n n + n n− 1 + · · ·+ n 1 = n n∑ j=1 1 j Which is approximately n log(n) by Euler’s approximation for harmonic sums. First Step Conditioning In every time period, Bobo the amoeba can die, live, or split into two amoebas with probabilities 0.25, 0.25, and 0.5, respectively. All of Bobo’s offspring have the same probabilities. Find P (D), the probability that Bobo’s lineage eventually dies out. Answer - We use law of probability, and define the events B0, B1. and B2 where Bi means that Bobo has split into i amoebas. We note that P (D|B0) = 1 since his lineage has died, P (D|B1) = P (D), and P (D|B2) = P (D)2 since both lines of his lineage must die out in order for Bobo’s lineage to die out. P (D) = 0.25P (D|B0) + 0.25P (D|B1) + 0.5P (D|B2) = 0.25 + 0.25P (D) + 0.5P (D) 2 Solving the quadratic equation, we get that P (D) = 0.5 or 1. We dismiss 1 as an extraneous solution since the expected number of Bobos increase every generation. Thus our answer is P (D) = 0.5 Orderings of i.i.d. random variables I call 2 UberX’s and 3 Lyfts at the same time. If the time it takes for the rides to reach me is i.i.d., what is the probability that all the Lyfts will arrive first? Answer - since the arrival times of the five cars are i.i.d., all 5! orderings of the arrivals are equally likely. There are 3!2! orderings that involve the Lyfts arriving first, so the probability that the Lyfts arrive first is 3!2! 5! = 1/10 . Alternatively, there are (5 3 ) ways to choose 3 of the 5 slots for the Lyfts to occupy, where each of the choices are equally likely. 1 of those choices have all 3 of the Lyfts arriving first, thus the probability is 1/ (5 3 ) = 1/10 Expectation of Negative Hypergeometric What is the expected number of cards that you draw before you pick your first Ace in a shuffled deck? Answer - Consider a non-Ace. Denote this to be card j. Let Ij be the indicator that card j will be drawn before the first Ace. Note that if j is before all 4 of the Aces in the deck, then Ij = 1. The probability that this occurs is 1/5, because out of 5 cards (the 4 Aces and the not Ace), the probability that the not Ace comes first is 1/5. 1/5 here is the probability that any specific non-Ace will appear before all of the Aces in the deck. (e.g. the probability that the Jack of Spades appears before all of the Aces). Thus let X be the number of cards that is drawn before the first Ace. Then X = I1 + I2 + ...+ I48, where each indicator correspond to one of the 48 not Aces. Thus, E(X) = E(I1) + E(I2) + ...+ E(I48) = 48/5 = 9.6 . Minimum and Maximum of Random Variables What is the CDF of the maximum of n independent Uniformly-distributed random variables? Answer - Note that P (min(X1, X2, . . . , Xn) ≥ a) = P (X1 ≥ a,X2 ≥ a, . . . , Xn ≥ a) Similarily, P (max(X1, X2, . . . , Xn) ≤ a) = P (X1 ≤ a,X2 ≤ a, . . . , Xn ≤ a) We will use that principal to find the CDF of U(n), where U(n) = max(U1, U2, . . . , Un) where Ui ∼ Unif(0, 1) (iid). P (max(U1, U2, . . . , Un) ≤ a) = P (U1 ≤ a, U2 ≤ a, . . . , Un ≤ a) = P (U1 ≤ a)P (U2 ≤ a) . . . P (Un ≤ a) = a n Pattern Matching with ex Taylor Series For X ∼ Pois(λ), find E ( 1 X + 1 ) . Answer - By LOTUS, E ( 1 X + 1 ) = ∞∑ k=0 1 k + 1 e−λλk k! = e−λ λ ∞∑ k=0 λk+1 (k + 1)! = e−λ λ (e λ − 1) Adam and Eve’s Laws William really likes speedsolving Rubik’s Cubes. But he’s pretty bad at it, so sometimes he fails. On any given day, William will attempt N ∼ Geom(s) Rubik’s Cubes. Suppose each time, he has a independent probability p of solving the cube. Let T be the number of Rubik’s Cubes he solves during a day. Find the mean and variance of T . Answer - Note that T |N ∼ Bin(N, p). As a result, we have by Adam’s Law that E(T ) = E(E(T |N)) = E(Np) = p(1− s) s Similarly, by Eve’s Law, we have that Var(T ) = E(Var(T |N)) + Var(E(T |N)) = E(Np(1− p)) + Var(Np) = p(1− p)(1− s) s + p2(1− s) s2 = p(1− s)(p+ s(1− p)) s2 MGF - Distribution Matching (Referring to the Rubik’s Cube question above) Find the MGF of T . What is the name of this distribution and its parameter(s)? Answer - By Adam’s Law, we have that E(e tT ) = E(E(e tT |N)) = E((pet + q)N ) = s ∞∑ n=0 (pe t + 1− p)n(1− s)n = s 1− (1− s)(pet + 1− p) = s s+ (1− s)p− (1− s)pet Intuitively, we would expect that T is distributed Geometrically because T is just a filtered version of N , which itself is Geometrically distributed. The MGF of a Geometric random variable X ∼ Geom(θ) is E(e tX ) = θ 1− (1− θ)et So, we would want to try to get our MGF into this form to identify what θ is. Taking our original MGF, it would appear that dividing by s+ (1− s)p would allow us to do this. Therefore, we have that E(etT ) = s s+ (1− s)p− (1− s)pet = s s+(1−s)p 1− (1−s)p s+(1−s)p e t By pattern-matching, it thus follows that T ∼ Geom(θ) where θ = s s+ (1− s)p MGF - Finding Momemts Find E(X3) for X ∼ Expo(λ) using the MGF of X. Answer - The MGF of an Expo(λ) is M(t) = λλ−t . To get the third moment, we can take the third derivative of the MGF and evaluate at t = 0: E(X 3 ) = 6 λ3 But a much nicer way to use the MGF here is via pattern recognition: note that M(t) looks like it came from a geometric series: 1 1− tλ = ∞∑ n=0 ( t λ )n = ∞∑ n=0 n! λn tn n! The coefficient of t n n! here is the n th moment of X, so we have E(Xn) = n!λn for all nonnegative integers n. So again we get the same answer. Markov Chains Suppose Xn is a two-state Markov chain with transition matrix Q = ( 0 1 0 1− α α 1 β 1− β ) Find the stationary distribution ~s = (s0, s1) of Xn by solving ~sQ = ~s, and show that the chain is reversible under this stationary distribution. Answer - By solving ~sQ = ~s, we have that s0 = s0(1− α) + s1β and s1 = s0(α) + s0(1− β) And by solving this system of linear equations it follows that ~s = ( β α+ β , α α+ β ) To show that this chain is reversible under this stationary distribution, we must show siqij = sjqji for all i, j. This is done if we can show s0q01 = s1q10. Indeed, s0q01 = αβ α+ β = s1q10 thus our chain is reversible under the stationary distribution. Markov Chains, continued William and Sebastian play a modified game of Settlers of Catan, where every turn they randomly move the robber (which starts on the center tile) to one of the adjacent hexagons. Robber a) Is this Markov Chain irreducible? Is it aperiodic? Answer - Yes to both The Markov Chain is irreducible because it can get from anywhere to anywhere else. The Markov Chain is also aperiodic because the robber can return back to a square in 2, 3, 4, 5, . . . moves. Those numbers have a GCD of 1, so the chain is aperiodic. b) What is the stationary distribution of this Markov Chain? Answer - Since this is a random walk on an undirected graph, the stationary distribution is proportional to the degree sequence. The degree for the corner pieces is 3, the degree for the edge pieces is 4, and the degree for the center pieces is 6. To normalize this degree sequence, we divide by its sum. The sum of the degrees is 6(3) + 6(4) + 7(6) = 72. Thus the stationary probability of being on a corner is 3/84 = 1/28, on an edge is 4/84 = 1/21, and in the center is 6/84 = 1/14. c) What fraction of the time will the robber be in the center tile in this game? Answer - From above, 1/14 . d) What is the expected amount of moves it will take for the robber to return? Answer - Since this chain is irreducible and aperiodic, to get the expected time to return we can just invert the stationary probability. Thus on average it will take 14 turns for the robber to return to the center tile. Problem Solving Strategies Contributions from Jessy Hwang, Yuan Jiang, Yuqi Hou 1. Getting Started. Start by defining events and/or defining random variables. (”Let A be the event that I pick the fair coin”; “Let X be the number of successes.”) Clear notion = clear thinking! Then decide what it is that you’re supposed to be finding, in terms of your location (“I want to find P (X = 3|A)”). Try simple and extreme cases. To make an abstract experiment more concrete, try drawing a picture or making up numbers that could have happened. Pattern recognition: does the structure of the problem resemble something we’ve seen before. 2. Calculating Probability of an Event. Use combinatorics if the naive definition of probability applies. Look for symmetries or something to condition on, then apply Bayes’ rule or LoTP. Is the probability of the complement easier to find? 3. Finding the distribution of a random variable. Check the support of the random variable: what values can it take on? Use this to rule out distributions that don’t fit. - Is there a story for one of the named distributions that fits the problem at hand? - Can you write the random variable as a function of a r.v. with a known distribution, say Y = g(X)? Then work directly from the definition of PDF or PMF, expressing P (Y ≤ y) or P (Y = y) in terms of events involving X only. - For PDFs, find the CDF first and then differentiate. - If you’re trying to find the joint distribution of two independent random variables, just multiple their marginal probabilities - Do you need the distribution? If the question only asks for the expected value of X, you might be able to find this without knowing the entire distirbution of X. See the next item. 4. Calculating Expectation. If it has a named distribution, check out the table of distributions. If its a function of a r.v. with a named distribution, try LotUS. If its a count of something, try breaking it up into indicator random variables. If you can condition on something, consider using Adam’s law. Also consider the variance formula. 5. Calculating Variance. Consider independence, named distributions, and LotUS. If it’s a count of something, break it up into a sum of indicator random variables. If you can condition on something, consider using Eve’s Law. 6. Calculating E(X2) - Do you already know E(X) or Var(X)? Remember that Var(X) = E(X2)− E(X)2. 7. Calculating Covariance If it’s a count of something, break it up into a sum of indicator random variables. If you’re trying to calculate the covariance between two components of a multinomial distribution, Xi, Xj , then the covariance is −npipj . 8. If X and Y are i.i.d., have you considered using symmetry? 9. Calculating Probabilities of Orderings of Random Variables Have you considered looking at order statistics? - Remember any ordering of i.i.d. random variables is equally likely. 10. Is this the birthday problem? Is this a multinomial problem? 11. Determining Independence Use the definition of independence. Think of extreme cases to see if you can find a counterexample. 12. Does something look like Simpson’s Paradox? make sure you’re looking at 3 events. 13. Find the PDF. If the question gives you two r.v., where you know the PDF of one r.v. and the other r.v. is a function of the first one, then the problem wants you to use a transformation of variables (Jacobian). You can also find the pdf by differentiating the CDF. 14. Do a painful integral. If your integral looks painful, see if you can write your integral in terms of a PDF (like Gamma or Beta), so that the integral equals 1. 15. Before moving on. Plug in some simple and extreme cases to make sure that your answer makes sense. Biohazards Section author: Jessy Hwang 1. Don’t misuse the native definition of probability - When answering “What is the probability that in a group of 3 people, no two have the same birth month?”, it is not correct to treat the people as indistinguishable balls being placed into 12 boxes, since that assumes the list of birth months {January, January, January} is just as likely as the list {January, April, June}, when the latter is fix times more likely. 2. Don’t confuse unconditional and conditional probabilities, or go in circles with Baye’s Rule - P (A|B) = P (B|A)P (A) P (B) . It is not correct to say “P (B) = 1 because we know that B happened.”; P(B) is the probability before we have information about whether B happened. It is not correct to use P (A|B) in place of P (A) on the right-hand side. 3. Don’t assume independence without justification - In the matching problem, the probability that card 1 is a match and card 2 is a match is not 1/n2. - The Binomial and Hypergeometric are often confused; the trials are independent in the Binomial story and not independent in the Hypergeometric story due to the lack of replacement. 4. Don’t confuse random variables, numbers, and events. - Let X be a r.v. Then f(X) is a r.b. for any function f . In particular, X2, |X|, F (X), and IX>3 are r.v.s. P (X2 < X|X ≥ 0), E(X),Var(X), and f(E(X)) are numbers. X = 2 and F (X) ≥ −1 are events. It does not make sense to write ∫∞ −∞ F (X)dx because F (X) is a random variable. It does not make sense to write P (X) because X is not an event. 5. A random variable is not the same thing as its distribution - To get the PDF of X2, you can’t just square the PDF of X. The right way is to use one variable transformations - To get the PDF of X + Y , you can’t just add the PDF of X and the PDF of Y . The right way is to compute the convolution. 6. E(g(X)) does not equal g(E(X)) in general. - See the St. Petersburg paradox for an extreme example. - The right way to find E(g(X)) is with LotUS.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved