Theory of Probability - Brett Bernstein Lecture 3 Finishing Basic Probability Review Exercises 1. Model flipping two fair coins using a sample space and a probability measure. Compute the probability of getting at least 1 head.. You are flipping a coin twice that has.75 chance of landing heads. What is the probability of getting 1 head? 3. Let S {1,,..., 100} with all outcomes equally likely. What is the probability of the event that a chosen integer isn t divisible by, 3, or 5? 4. We want to distribute 10 identical pieces of candy to 4 (distinguishable children. For each piece of candy we roll a 4-sided die and give it to the associated child. What is the probability that the children receive, 5, 1, pieces of candy, respectively (i.e., first child receives, second receives 5, etc.? 5. Suppose you are dealt an ordered sequence of k cards from a 5 card deck without replacement. Compute the probability of getting a particular k card (unordered hand. Solutions 1. Let S {(a 1, a : a i is H or T } with all outcomes equally likely. This intuitive choice for the probability measure will later be seen to come from the fact that the coin flips are modeled as independent of each other. Then we have P (at least one head P ({HT, T H, HH} 3 4, where we wrote (H, T as HT for brevity.. Let S {(a 1, a : a i {H, T }}. We define P by P ({(a 1, a }.75 # of heads # of tails.5 using the general finite space. As above, we will see later that this intuitive choice of measure comes from modeling the coin flips as independent. Note that The result is (.75(.5 6 16..75 +.5 + (.75(.5 1. 1
3. Let D, D 3, D 5 denote the subsets of S divisible by, 3, 5 respectively. Then D 50, D 3 33, D 5 0, D D 3 16, D D 5 10, D 3 D 5 6, D D 3 D 5 3. This gives The answer is 1 74 100 6 100. D D 3 D 5 50 + 33 + 0 16 10 6 + 3 74. 4. Here S {(d 1,..., d 10 : d i {1,, 3, 4}} and all outcomes are equally likely. The probability is thus ( 10,5,1, 4 10 since S 4 10 and there are ( 10 10!, 5, 1,!5!! elements of S that give the correct counts. Thus an un- 1 5. Each ordered hand occurs with an equal probability of ordered hand occurs with probability 5 51 (5 k+1. k! 5 51 (5 k + 1 ( 1 5 k as expected. Aside on Measure Theory Example 1 (Infinite Coin Flipping. Suppose we have a fair coin and wish to flip it infinitely many times. We could use the following sample space: S {(a 1, a,... : a i {H, T }}. If the model accurately reflects coin flipping, we would want the following calculations to hold: 1. The probability the first flip is heads should be 1/.. The probability the first two flips are heads should be 1/4. 3. In general, if we specify the first k flips, we expect the probability to be 1/ k.
This leads to the uncomfortable situation that the probability of any particular sequence is 0 since it must be less than 1/ k for all k by monotonicity. Thus any measure we choose must assign positive probability to some events, while assigning zero probability to every singleton event. Formulaically, we must have P ({s} 0, for any s S. One key aspect to why this is possible is that S is an uncountable set, so it can have probability 1 even though all the singletons have probability zero. This is one of the nuances of countable additivity. Example (Aside: Lebesgue Measure. Let S [0, 1] and define P ( (a, b b a. For more general events A we define { } P (A inf (b i a i : (a i, b i A. That is, we approximate general sets by looking at coverings by intervals. The infimum forces us to look at tighter and tighter coverings. In addition to allowing us to deal with sets of real numbers, we can use this model to study sequences of fair coin flips by treating each sequence as a binary expansion. One odd consequence of this theory is that not every subset of [0, 1] can be called an event if we want countable additivity to hold. As such, we restrict to a special class of events called the measurable sets. A Few More Interesting Examples 1. Assuming you are randomly dealt 5 cards from a standard deck of playing cards, what is the probability of getting two pair (a poker hand? What about a full house?. How many students are needed in a class before it is likely that at least have the same birthday (assume 365 days in a year, each day equally likely? 3. Suppose n people put their ID cards down, the IDs are shuffled, and then randomly returned. What is the probability nobody gets their ID back? What if n is very large? Solutions 1. Consider the sample space of all 5 card subsets of the 5 cards, where each is equally likely. To count the number of possible ways to get two pair, we first choose the two values for the pairs, and the value of the remaining card. Then we choose the suits for each pair and the extra card. ( 13 ( 11 4 ( 4 ( 1 5.0475. 5 3
The probability of a full house is 13 1 (4 4 3( ( 5.0014. 5 We could also count these using ordered hands. For two pair we have 5 3 48 3 44 5!!!! 5 51 50 49 48. Here we count all possible ways of choosing hands in the order 113, where the numbers denote values. Then we must consider all orderings of this string, but divide by an extra! since swapping the 1 s and s doesn t yield a distinct hand. To me this calculation seems trickier. For full house we have 5 3 48 3 5! 3!! 5 51 50 49 48.. 3 students. The probability of k students and no matches is 365 364 (365 k + 1 365 k. Here the sample space is the collection of all sequences of length k where each value is between 1 and 365, and each sequence is equally likely. Even though 3 seems small compared to 365, the number of pairs of people is actually ( 3 53. 3. The probability is roughly 1/e. To compute the precise probability we use inclusionexclusion. Our sample space is the collection of all permutations of 1,..., n with each equally likely (there are n! of them. Let A i denote the event that the ith person picks up his ID (i.e., that the permutation has i in the ith spot. Then we can compute the probability of at least one person getting their correct ID by inclusion-exclusion: P Note that ( n A i n P (A i i<j P (A i A j + + ( 1 n 1 P (A 1 A n. P (A i 1/n, P (A i A j 1/(n(n 1, P (A i A j A k 1/(n(n 1(n,.... In general, if there are p distinct A i events we are intersecting the probability is (n p! n!. 4
This gives ( n P A i n 1 ( n n 1 n(n 1 + + 1 ( 1n 1 n! n ( n (n k! ( 1 k k n! k1 n ( 1 k 1 k! k1 1 e 1, where the approximation improves as n grows (error is less than 1/n!. The answer to our problem is 1/e. It is interesting that the probability converges to a number strictly between 0 and 1. One intuition for the result is that the events are weakly dependent and well approximated by independent events. Conditional Probability Conditioning Formula Conditioning is the process where we update our beliefs (probabilities in light of newly found information. As you can imagine, this is a very practical skill to master. It turns out that conditioning can also be used to greatly simplify a calculation, by using the Law of Total Probability. Joseph Blitzstein says Conditioning is the Soul of Statistics. Let A, B be events with P (B > 0. We use the notation P (A B (read probability of A given B denote the probability that A assuming B will occur. In other words, assuming we have the information that B will occur, P (A B is the probability that A will occur. Formally, it is defined to be P (A B P (A B. P (B We also write P (A B, C P (A B C P (A BC. The motivation for this formula can be understood by looking at examples of sample spaces, and drawing Venn diagrams. It can also be instructive to look at the formula in the following form: P (A B P (A BP (B. That is, the probability of A and B both occurring is the probability of B occurring, times the probability of A occurring given that B has occurred. This can be repeated iteratively, to get the following formula: Theorem 3. Assuming P (E 1 > 0,..., P (E n > 0, P (E 1 E E n P (E 1 P (E E 1 P (E 3 E 1, E P (E n E 1,..., E n 1. 5
Proof. We simply write out all the formulas and cancel: P (E 1 P (E E 1 P (E 3 E E 1 P (E n E 1 P (E 1 P (E E 1 P (E n 1 E 1 P (E n E 1. Example 4 (Tree Diagrams. We are going to make 3 coin flips, but with the following rules. Each time you get a heads, the next coin should have half the chance of getting heads (i.e., if your first flip is heads, the second coin will have 1/4 probability of heads, and 3/4 probability of tails. What are the probabilities of getting 0,1,,3 heads? One way to solve this problem is to draw a tree whose leaves are all 8 possibilities. The probabilities on each edge are conditional, and to compute the probabilities of the leaves, we are using the previous theorem. Conditioning Exercises 1. A jar contains a 4-sided die and a 6-sided die. We uniformly at random pick a die out of the jar and roll it. Assuming the roll is 3, what is the chance we picked the 6-sided die? Do you expect it to be bigger or smaller than 1/?. Flip a fair coin 5 times in a row. Assuming we get a total of 3 heads, what is the probability the first flip was heads? 3. A standard deck is shuffled, and two cards are dealt face down. The dealer looks at both cards. Solutions (a If the dealer tells you At least one of the cards is an Ace, what is the probability there are two Aces? (b If the dealer tells you One of the cards is the Ace of Spades, what is the probability there are two Aces? 1. Let A be the event of rolling a 3, and B be the event of drawing the 6-sided die. Then we have Then we have P (A B 1/6 1 P (B A P (A B P (A and P (A 1 1 6 + 1 1 4. 1/1 1/1 + 1/8 1 1 + 3/ 5. Note that we would have gotten the same result for rolls 1,, 3, 4. For roll 5, 6 we know it is 6-sided with probability 1. 6
. Let A be the event of getting 3 heads, and let B be the event that the first flip is heads. Then ( ( 5 1 4 1 P (A and P (A B 3 5. 5 Thus we have P (B A P (A B P (A ( 4 ( 5 3 3 5. 3. (a Let A denote the event of getting two aces, and B the event of getting at least one ace. Then we have Thus the answer is P (B 4 51 + 48 4 5 51 P (A B P (B and P (A B 4 3 5 51. 1 4 51 + 48 4 3 99 1 33. (b Let A denote the event of getting two aces, and B the event of getting the ace of spaces. Then we have Thus the answer is P (B 51 5 51 P (A B P (B and P (A B 3 5 51. 6 51 1 17. 7