Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 12

EECS 70 Discrete Mathematics ad Probability Theory Sprig 204 Aat Sahai Note 2 Probability Examples Based o Coutig We will ow look at examples of radom experimets ad their correspodig sample spaces, alog with possible probability spaces ad evets. As we do so, we ll add a couple more tools to our repertoire: () Stirlig s Approximatio; ad (2) How to combie multiple idepedet experimets ito a sigle probability space. Fair Coi Flippig ad Stirlig s Approximatio Suppose we have a ubiased coi, ad our experimet cosists of flippig the coi 4 times. The sample space Ω cosists of the sixtee possible sequeces of H s ad T s. For a fair coi, the probabilities are assiged uiformly; the probability of each sample poit is 6. Cosider evet A 2 : the evet that there are exactly two heads. The probability of ay particular outcome with two heads (such as HT HT ) is 6. So the key is to cout A 2. How may such outcomes are there? There are ( 4 2) 6 ways of choosig the positios of the heads, ad these choices completely specify the sequece. So Pr[A 2 ] 6 6 3 8. More geerally, if we flip the coi times, we get a sample space Ω of cardiality 2. The sample poits are all possible sequeces of H s ad T s. Now cosider the evet A r that we get exactly r H s whe we flip the coi times. This evet cosists of exactly ( ) r sample poits. Each has probability 2. So the probability of this evet, P[A r ] ( r) 2. It is iterestig to observe that as gets larger, the deomiator above gets larger expoetially i. But the umber of differet A r is just + ad so grows oly liearly with. So how does the probability distribute itself across the differet A r? That is what the fair coi tossig examples were showig you experimetally i the first probability lecture ote. But to get a aalytic grasp o what is goig o, we re goig to have to fid a way to traslate from the world of factorials (that we see i the umerator ( r) ) ad the world of expoetials (that we see i the deomiator). The key tool for doig this is the famous Stirlig s Approximatio:! 2π( e ). () I practice, this is a very good approximatio. (Plot it to see this!) It is also very surprisig at first glace. To uderstad somethig as simple as factorial which just ivolves the multiplicatio of wholesome itegers i order we eed to ivoke three irratioal umbers 2,π,e that seem to come from owhere. Why should multiplyig umbers i order have aythig to do with the legth of a diagoal of a square, the ratio of the circumferece of a circle to its diameter, ad the base of atural logarithms?!? This kid of likage across very differet areas is a part of the deep beauty of mathematics. EECS 70, Sprig 204, Note 2

We ca use Stirlig s approximatio to get a better ituitive hadle o ( r). To help us simplify, let s defie q r. ( ) r! r!( r)! 2π( e ) 2πr( r e )r 2π( r)( r e ) r (3) 2π r( r)r r ( r) r (4) 2π q( q)(q) q (( q)) ( q) (5) 2π q( q)q q ( q) ( q) (6) ( 2π q( q) q q ( q) ( q) ) (7) Notice that last term is actually somethig expoetial i. For example, plug i q 2 ad you will get 2. (q) q ( q) ( q) This gives us the curious observatio that about 2 π of all possible -legth biary strigs have exactly the same umber of zeros ad oes. Or i the laguage of probability, that is the approximate chace of gettig exactly the same umber of heads ad tails whe tossig a large (eve) umber of cois. Could this have somethig to do with what we had observed earlier experimetally i a sequece of coi tosses? We ll see later i the course. Card Shufflig The radom experimet cosists of shufflig a deck of cards. Ω is equal to the set of the 52! permutatios of the deck. The probability space is uiform. Note that we re really talkig about a idealized mathematical model of shufflig here; i real life, there will always be a bit of bias i our shufflig. However, the mathematical model is close eough to be useful. Poker Hads Here s aother experimet: shufflig a deck of cards ad dealig a poker had. I this case, S is the set of 52 cards ad our sample space Ω {all possible poker hads}, which correspods to choosig k 5 objects without replacemet from a set of size 52 where order does ot matter. Hece, as we saw i the previous Note, Ω ( ) 52 5 52 5 50 49 48 5 4 3 2 2, 598, 960. Sice the deck is assumed to be radomly shuffled, the probability of each outcome is equally likely ad we are therefore dealig with a uiform probability space. Let A be the evet that the poker had is a flush. [For those who are ot (yet) addicted to gamblig, a flush is a had i which all cards have the same suit, say Hearts.] Sice the probability space is uiform, computig Pr[A] reduces to simply computig A, or the umber of poker hads which are flushes. There are 3 cards i each suit, so the umber of flushes i each suit is ( ) ( 3 5. The total umber of flushes is therefore 4 3 ) 5. The we have Pr[had is a flush] 4 (3 ) 5 ) ( 52 5 4 3! 5! 47! 5! 8! 52! 4 3 2 0 9 52 5 50 49 48 0.002. (2) EECS 70, Sprig 204, Note 2 2

As a exercise, you should compare to what the Stirlig s approximatio would yield for the above exact calculatio. Balls ad Bis I this experimet, we will throw 20 (labeled) balls ito 0 (labeled) bis. Assume that each ball is equally likely to lad i ay bi, regardless of what happes to the other balls. If you wish to uderstad this situatio i terms of samplig a sequece of k elemets from a set S of cardiality : here the set S cosists of the 0 bis, ad we are samplig with replacemet k 20 times. The order of samplig matters, sice the balls are labeled. The sample space Ω is equal to {(b,b 2,...,b 20 ) : b i 0}, where the compoet b i deotes the bi i which ball i lads. The cardiality of the sample space, Ω, is equal to 0 20 - each elemet b i i the sequece has 0 possible choices, ad there are 20 elemets i the sequece. More geerally, if we throw m balls ito bis, we have a sample space of size m. The probability space is uiform; as we said earlier, each ball is equally likely to lad i ay bi. Let A be the evet that bi is empty. Sice the probability space is uiform, we simply eed to cout how may outcomes have this property. This is exactly the umber of ways all 20 balls ca fall ito the remaiig ie bis, which is 9 20. Hece, Pr[A] 920 ( 9 0 20 0 )20 0.2. Let B be the evet that bi cotais at least oe ball. This evet is the complemet Ā of A, i.e., it cosists of precisely those sample poits which are ot i A. So Pr[B] Pr[A].88. More geerally, if we throw m balls ito bis, we have: ( ) m ( Pr[bi is empty] m. ) As we shall see, balls ad bis is aother probability space that shows up very ofte i EECS: for example, we ca thik of it as modelig a load balacig scheme, i which each job is set to a radom processor. It is also a more geeral model for problems we have previously cosidered. For example, flippig a fair coi 3 times is a special case i which the umber of balls (m) is 3 ad the umber of bis () is 2. Rollig two dice (a example i the previous lecture otes) is a special case i which m 2 ad 6. Birthday Paradox The birthday paradox is a remarkable pheomeo that examies the chaces that two people i a group have the same birthday. It is a paradox ot because of a logical cotradictio, but because it goes agaist ituitio. For ease of calculatio, we take the umber of days i a year to be 365. The U {,...,365}, ad the radom experimet cosists of drawig a sample of elemets from U, where the elemets are the birth dates of people i a group. The Ω 365. This is because each sample poit is a sequece of possible birthdays for people; so there are poits i the sequece ad each poit has 365 possible values. Let A be the evet that at least two people have the same birthday. If we wat to determie Pr[A], it might be simpler to istead compute the probability of the complemet of A, Pr[Ā]. Ā is the evet that o two people have the same birthday. Sice Pr[A] Pr[Ā], we ca the easily compute Pr[A]. We are agai workig i a uiform probability space, so we just eed to determie Ā. Equivaletly, we are computig the umber of ways there are for o two people to have the same birthday. There are 365 choices for the first perso, 364 for the secod,..., 365 + choices for the th perso, for a total of EECS 70, Sprig 204, Note 2 3

365 364 (365 + ). Note that this is simply a applicatio of the first rule of coutig; we are samplig without replacemet ad the order matters. Thus we have Pr[Ā] Ā Ω 365 364 (365 +) 365. The Pr[A] 365 364 (365 +) 365. This allows us to compute Pr[A] as a fuctio of the umber of people,. Of course, as icreases Pr[A] icreases. I fact, with 23 people you should be willig to bet that at least two people do have the same birthday, sice the Pr[A] is larger tha 50%! For 60 people, Pr[A] is over 99%. This is a somewhat surprisig aspect of the ature of radom fluctuatios. We will study the egieerig impacts of this later whe we thik about load balacig ad hash tables. Ufair cois ad how to combie experimets ito a sigle probability space Suppose that istead of a fair coi, we have a coi of bias p, ad our experimet cosists of flippig the coi 4 times. The sample space Ω still cosists of the sixtee possible sequeces of H s ad T s. However, the probability space depeds o p. If p 2 the probabilities are assiged uiformly; the probability of each sample poit is 6. What if the coi comes up heads with probability 2 3 ad tails with probability 3 (i.e. the bias is p 2 3 )? The the probabilities of differet outcomes are differet. For example, Pr[HHHH] 2 3 2 3 2 3 2 3 6 8, while Pr[T T HH] 3 3 2 3 2 3 4 8. [Note: We have cheerfully multiplied probabilities here; But why is this allowed? It is ot always OK!] The multiplicatio here ca be justified i terms of uiform probabilities by thikig of the uderlyig experimet as rollig a fair three-sided die istead. We say H whe the die comes up or 2 ad say T whe the die comes up 3. I that case, there are a grad total of 8 3 4 possible die-roll sequeces. Ad the 6 above comes from the 2 2 2 2 differet outcomes that all get labeled HHHH. The same argumet works for justifyig the calculatio of TTHH s probability. What we are doig here is arguig what the 2 3 probability of heads meas. I geeral, whe we take make a ew uified probability space out of two completely idepedet ad separate experimets (these are experimets that caot ifluece each other i ay way), the the ew outcomes are pairs of outcomes. So the ew sample space Ω Ω Ω 2 usig set theoretic otatio for the set of all pairs where the first elemet is from Ω ad the secod is from Ω 2. The rule for evaluatig the probability of the idividual outcomes ω (ω,ω 2 ) is just Pr[ω] Pr [ω ] Pr 2 [ω 2 ]. This is geeralized i the atural way to lists loger tha two. For the ufair cois here, we are lookig at 4-tuples. What type of evets ca we cosider i this settig? Let evet S be the evet that all four coi tosses are the same. The S {HHHH,T T T T }. HHHH has probability ( 2 3 )4 ad T T T T has probability ( 3 )4. Thus, Pr[S] Pr[HHHH] + Pr[T T T T ] ( 2 3 )4 + ( 3 )4 7 8. Next, cosider the evet A 2 : the evet that there are exactly two heads. We had see this oe earlier i the fair coi settig. This time, the probability of ay particular outcome with two heads (such as HT HT ) is ( 2 3 )2 ( 3 )2. Notice that the order of the heads does t matter i doig this calculatio. This is because real multiplicatio commutes. Oce agai, the key is to cout A 2. There are ( 4 2) 6 ways of choosig the positios of the heads, ad so Pr[A 2 ] 6( 2 3 )2 ( 3 )2 24 8 8 27. More geerally, if we flip the biased coi times, we get a sample space Ω of cardiality 2. This is the same as the fair coi toss case. If the coi has bias p, ad if we cosider ay sequece of coi flips with exactly r H s, the the probability of this sequece is p r ( p) r. As before, we ca cosider the evet A r that we get exactly r H s whe we flip the coi times. This evet EECS 70, Sprig 204, Note 2 4

cosists of exactly ( ) r sample poits ad so the probability of this evet, P[Ar ] ( r) p r ( p) r. We ca use our earlier Stirlig s approximatio-based calculatio to shed some light o what this is like. As before, let q r. P[A r ] ( ) p r ( p) r r ( 2π q( q) q q ( q) ( q) ) (p q ( p) q ) (9) ) (0) ( ( p 2π q( q) q )q ( p q )( q) It is very iterestig to observe what happes whe p q above. I that case, Stirlig s approximatio says that the probability of gettig this outcome is like. 2π p( p) Biased coi-tossig sequeces show up i may cotexts: for example, they might model the behavior of trials of a faulty system, which fails each time with probability p. (8) Coditioal Probability Examples With coutig available to us, we ca cosider some more simple examples that deal with coditioal probability. Card Dealig What is the probability that, whe dealig 2 cards ad the first card is kow to be a ace, the secod card is also a ace? Let B be the evet that the first card is a ace, ad let A be the evet that the secod card is a ace. Note that P[A] P[B] 3. To compute Pr[A B], we eed to figure out Pr[A B]. This is the probability that both cards are aces. Note that there are 52 5 sample poits i the sample space, sice each sample poit is a sequece of two cards. A sample poit is i A B if both cards are aces. This ca happe i 4 3 2 ways. Sice each sample poit is equally likely, Pr[A B] 2 52 5. The probability of evet B, drawig a ace i the first trial, is 4 52. Therefore, Pr[A B] Pr[A B] 3 Pr[B] 5. Note that this says that if the first card is a ace, it makes it less likely that the secod card is also a ace. Balls ad Bis Revisited Imagie we have two bis cotaiig black ad white balls, ad further suppose that we wated to kow what is the chace that we picked Bi give that we picked a white ball, i.e., Pr[Bi ]. Assume that we are ubiased whe choosig a bi so that each bi is chose with probability 2. EECS 70, Sprig 204, Note 2 5

A wrog approach is to say that the aswer is clearly 2 3, sice we kow there are a total of three white balls, two of which are i bi. However, this picture is misleadig because the bis have equal weight". Istead, what we should do is appropriately scale each sample poit as the followig picture shows: This images shows that the sample space Ω is equal to the uio of the evets cotaied i bi (A ) ad bi 2(A 2 ), so Ω A A 2. This is NOT the same as viewig this as two idepedet experimets. We get either somethig from bi or somethig from bi 2. We do t get oe from each. We ca use the defiitio of coditioal probability to see that Pr[Bi ] 0 + 0 0 + 0 + 4 2 0 9 20 4 9 Let us try to achieve this probability usig Bayes rule. To apply Bayes rule, we eed to compute Pr[ Bi ], Pr[Bi ] ad Pr[ ]. Pr[ Bi ] is the chace that we pick a white ball give that we picked bi, which is 2 5. Pr[ Bi ] is 2 as give i the descriptio of the problem. Fially, Pr[ ] ca be computed usig the Total Probability rule: Pr[ ] Pr[ Bi ] Pr[Bi ] + Pr[ Bi 2] Pr[Bi 2] 2 5 2 + 2 2 9 20. Observe that we ca apply the Total Probability rule here because Pr[ Bi ] is the complemet of Pr[ Bi 2]. Fially, if we plug the above values ito Bayes rule we obtai the probability that we picked bi give that we picked a white ball: Pr[Bi ] 2 5 2 9 20 2 0 9 20 4 9. All we have doe above is combied Bayes rule ad the Total Probability rule. We could have equivaletly applied Bayes rule directly. Notice here that we were able to do the calculatio without beig very precise about what the uderlyig probability space is. This is ofte the case whe dealig with coditioal probability, but it is good to be able to precisely defie what outcomes are if you eed to. Here, the most atural choice of the uderylig sample space is to cosider the compositio of three idepedet ad distict experimets. Tossig a fair coi. Pickig somethig from Bi. Ad pickig somethig from Bi 2. If we further view each of the balls as beig distict, this is a uiform probability space i which each of the 20 possible outcomes EECS 70, Sprig 204, Note 2 6

Summary The examples above illustrate the importace of doig probability calculatios systematically, rather tha just ituitively." Recall the key steps i all our calculatios: What is the sample space (i.e., the experimet ad its set of possible outcomes)? What is the probability of each outcome (sample poit)? What is the evet we are iterested i (i.e., which subset of the sample space)? Fially, compute the probability of the evet by addig up the probabilities of the sample poits iside it. Wheever you meet a probability problem, you should always go back to these basics to avoid potetial pitfalls. Eve experieced researchers make mistakes whe they forget to do this witess may erroeous proofs, submitted by mathematicias to ewspapers at the time, of the fact that the switchig strategy i the Moty Hall problem does ot improve the odds. has probability 2 5 2 20. If we istead view balls oly i terms of their color, the there are oly 2 2 2 8 possibilities ad they are ot all equally likely. EECS 70, Sprig 204, Note 2 7