Download Understanding Conditional Probability: Definition, Rules, and Applications - Prof. Dilip S and more Study notes Statistics in PDF only on Docsity! ECE 313 — Probability with Engineering Applications Fall 2000 Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign 13.1 ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 1 of 39 Introduction l The conditional probability of an a event B given that event A occurred is our revised estimate of the chances that B occurred in light of partial knowledge of the outcome of the experiment, viz. knowing that A occurred l To avoid trivialities, we assume that A, sometimes called the conditioning event, has nonzero probability ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 2 of 39 Definition of conditional probability l The conditional probability of B given A is denoted by P(B|A) l Read this as “the probability of B given A” or “the probability of B conditioned on A” l Definition: If P(A) > 0, P(B |A) is defined as P(B|A) = P(AB) P(A) l P(B|A) can be larger than, smaller than, or the same as P(B) ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 3 of 39 Consistent with various models l The definition of conditional probability is consistent with nclassical approach to probability n relative frequency approach l Conditional probabilities can also be discussed for events defined in terms of random variables l P{X = k | X > n}? or P{X ≤ k | a < X < b}? ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 4 of 39 Geometric RVs are memoryless l Let X denote a geometric random variable with parameter p l For k > 0, P{X = k+r | X > r} = P{X = k} l Given that the event {X > r} has occurred, that is, the first r trials ended in a “failure”, the probability that we need to wait for an additional k trials to observe the first success is the same as P{X = k} l It’s as if the first r trials are forgotten! ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 5 of 39 Binomial random variables l Let X denote a binomial random variable with parameters (n, p) l GIven the event {X = k} has occurred, the conditional probability that the j-th trial resulted in a success is k/n, independent of the value of p l The conditional probability of successes on the i-th and j-th trials is k(k–1)/[n(n–1)] l and so on ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 6 of 39 Axioms are satisfied l Conditional probabilities are a probability measure, that is, they satisfy the axioms of probability theory l All the consequences of the axioms (rules of probability) also apply to conditional probabilities l Caveat: Everything must be conditioned on the same event. No mixing and matching allowed ECE 313 — Probability with Engineering Applications Fall 2000 Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign 13.2 ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 7 of 39 Rules? What rules? l P(Ω|A) = 1 l P(∅|A) = 0 l P(Bc|A) = 1 – P(B|A) l If B ⊂ C, then P(B|A) ≤ P(C|A) l If BC = ∅, then P((B ∪ C)|A) = P(B|A) + P(C|A) l More generally, P((B ∪ C)|A) = P(B|A) + P(C|A) – P(BC|A) ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 8 of 39 Left side versus right side l An expression such as P((B ∪ C)|(A ∪ D)) is commonly written as P(B ∪ C|A ∪ D) l Everything to the right of the vertical bar is the conditioning event; it is a single set l Everything to the left of the vertical bar is the conditioned event; it is a single set l Even if A, B, C, and D are disjoint, P(B ∪ C|A ∪ D) ≠ P(B) + P(C|A) +P(D) ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 9 of 39 Is that all there is to it? l OK, so you can update your probabilities to conditional probabilities if you know that event A occurred n Is that all there is to it? n Is the notion of conditional probability just a one-trick pony? nSurely life holds more than that? l Actually, conditional probabilities are fundamental tools in probabilistic analyses ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 10 of 39 The chain rule or product rule l P(B|A) = P(AB)/P(A) l P(AB) = P(B|A)P(A) l Note that P(AB) can also be expressed as P(A|B)P(B) l The conditional probability P(B|A) can be used to compute the joint probability P(AB) l Conditional probability P(B |A) times P(A), the probability of the conditioning event ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 11 of 39 Generalization of the chain rule l More generally, P(ABCD…)=P(A)P(B |A)P(C|AB)P(D|ABC)… l Product of first two terms is P(AB) l P(C|AB)P(AB) = P(ABC), so that the product of the first three terms is P(ABC), and so on … l For ABCD… to occur, A must occur, and if A has occurred, so must B (with probability P(B|A)); if both A and B, then C must … ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 12 of 39 Applications of the chain rule l Example: A random sample of size k is drawn without replacement from the set {1, 2, … , n}. What is the probability that the sample is exactly {1, 2, 3, … , k–1, n}? l Simple answer: There are equally likely subsets that could have been drawn, and so the desired probability is just n k n k –1 ECE 313 — Probability with Engineering Applications Fall 2000 Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign 13.5 ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 25 of 39 Applications l Example: Box I has 3 green and 2 red balls, while Box II has 2 green and 2 red balls. A ball is drawn at random from Box I and transferred to Box II. Then, a ball is drawn at random from Box II. What is the probability that the ball drawn from Box II is green? l Note that the color of the ball transferred from Box I to Box II is not known ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 26 of 39 Example (continued) l The color of the ball transferred is not known, but it’s either green or red for sure! Box I Box II ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 27 of 39 Example (continued) l Box I has 3g, 2r; Box II has 2g, 2r l After the transfer, Box II has 5 balls in it l G = event ball drawn from Box II is green l A = event ball transferred is red l P(G|A) = 2/5 l P(G|Ac) = 3/5 l P(A) = 2/5 l P(G) = P(G|A)P(A) + P(G|Ac)P(Ac) = (2/5)(2/5) + (3/5)(3/5) = 13/25 ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 28 of 39 A built-in test for checking answers l The probability of event A is the weighted average of P(A|B) and P(A|Bc) l P(A) = P(A|B)P(B) + P(A|Bc)P(Bc) = P(A|B)P(B) + P(A|Bc)[1 – P(B)] l The linear function y = a•x + b•(1 – x) has value b at x = 0 and a at x = 1 l For 0 < x < 1, y is between a and b l P(A) is between P(A|B) and P(A|Bc) ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 29 of 39 Example (checking our work) l P(G|A) = 2/5 l P(G|Ac) = 3/5 l P(G) = P(G|A)P(A) + P(G|Ac)P(Ac) = (2/5)(2/5) + (3/5)(3/5) = 13/25 P(G|A) = 2/5 ≤ P(G) = 13/25 ≤ P(G|Ac) = 3/5 l If the check is satisfied, it does not imply that your work is right; there may be other mistakes, e.g. you computed P(G) = 12/25 l But, if the check is not satisfied, … ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 30 of 39 Generalizations of the theorem I l P(A) = P(A|B)P(B) + P(A|Bc)P(Bc) l Since conditional probabilities form a probability measure, a similar result also holds for conditional probabilities l P(A|C) = P(A|BC)P(B|C)+P(A|Bc C)P(Bc|C) l All probabilities in the first equation are now conditioned on C (in addition to any previously existing conditioning) ECE 313 — Probability with Engineering Applications Fall 2000 Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign 13.6 ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 31 of 39 Example P(A|C) = P(A|BC)P(B|C) + P(A|Bc C)P(Bc|C) l A = event that a flight is late in arriving l B = event that flight is arriving at O’Hare l C = event that flight is an United Airlines l P(A|BC) = probability that a United Airlines flight is late arriving at O’Hare l P(A|BcC) = probability that a United Airlines flight is late arriving elsewhere ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 32 of 39 Example (continued) P(A|C) = P(A|BC)P(B|C) + P(A|Bc C)P(Bc|C) l A = event that a flight is late in arriving l B = event that flight is arriving at O’Hare l C = event that flight is on United Airlines l P(B|C) = probability that a flight arriving at O’Hare is a United Airlines flight l P(Bc|C) = probability that a flight arriving elsewhere is a United Airlines flight ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 33 of 39 Example (continued) P(A|C) = P(A|BC)P(B|C) + P(A|Bc C)P(Bc|C) l P(A|BC), P(A|Bc C), P(B|C), and P(Bc|C) can all be estimated (for example, via relative frequencies) by United Airlines or by the FAA l P(A|C) = probability that a United Airlines flight is late can then be computed (and published in the newspapers) ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 34 of 39 Generalizations of the theorem II Given a countable partition A1, A2 , … An, … of the sample space, P(B) = P(B|A1)P(A1) + P(B|A2)P(A2) + … + P(B|An)P(An) + … The theorem as presented originally was the finite case n = 2 of this more general result The two generalizations can also be combined: condition throughout on C! ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 35 of 39 Generalization of built-in test I l Suppose that P(B) = P(B|A1)P(A1) + P(B|A2)P(A2) + … + P(B|An)P(An) + … l This is a weighted sum of the P(B |Ai) l If P(B|Aj) is the smallest of the P(B|Ai), then replacing the P(B|Ai) by P(B|Aj) gives P(B) ≥ P(B|Aj)•[P(A1) + P(A2) + … ] = P(B|Aj) ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 36 of 39 Generalization of built-in test II l Suppose that P(B) = P(B|A1)P(A1) + P(B|A2)P(A2) + … + P(B|An)P(An) + … l If P(B|Ak) is the largest of the P(B|Ai), then replacing the P(B|Ai) by P(B|Aj) gives P(B) ≤ P(B|Ak)•[P(A1) + P(A2) + … ] = P(B|Ak) l Conclusion: P(B|Aj) ≤ P(B) ≤ P(B|Ak) ECE 313 — Probability with Engineering Applications Fall 2000 Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign 13.7 ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 37 of 39 Another Example l You and a friend (also taking ECE 313) are at a party with N–1 other people when suddenly a conga line forms. Assume that all (N+1)! orderings are possible l What is the probability that your friend is ahead of you in the conga line? l Answer: 1/2 (by symmetry) l If there was a different (correct) answer, you would be ahead with same prob ≠ 1/2 ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 38 of 39 Do it by the theorem… l Both you and your friend are equally likely to be anywhere in the conga line l P(you are in j-th position) = 1/(N + 1) l P(friend ahead|you in j-th) = (j – 1)/N l Why j–1? Why N and not N+1? l P(friend ahead) = sum of [(j–1)/N]•[1/(N+1)] = [0 + 1 + … + N]/[N•(N + 1)] = 1/2 l 1 + 2 + … + N = N•(N + 1)/2 !!!! ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 39 of 39 Summary l The chain rule or product rule allows us to compute a joint probability (i.e. probability of an intersection) as the product of various conditional probabilities l The theorem of total probability allows us to find an unconditional probability from conditional probabilities l We discussed some examples of the applications of these rules