Bernoulli Trials, Binomial and Hypergeometric Distrubutions Definitions: Bernoulli Trial: A random event whose outcome is true (1) or false (). Binomial Distribution: n Bernoulli trials. p The probability of a true (1) outcome (also called a success) q 1-p. The probability of a false () outcome (also called a failure) n The number of trials f(x n, p) The probability density function with sample size n and probabilty of success p. Discussion: MATLAB Example: Flip a coin 1 times. The probability of a heads is 6. First, generate 1 random numbers in the interval (,1): -->X = rand(1,1) X = 388326 9813415 7428482 1616391 3652522 38357 1169181 8911246 634272 6263942 Change this so each trial is binary ( or 1) with a probability of a 1 being 6: -->X = 1*(X < 6) X = Question: What is the probability of getting a '1' for a single trial? Answer: 6. Call this 'p' in general. JSG 1 rev 8/2/11
Question: What is the probability of getting m heads in 2 trials? Answer 1: Enumerate all possibities and probabilities: result probability 1 1 6 * 6 1 6 * 4 1 4 * 6 4 * 4 or, recalling that p = 6: m P(m) 2 1 p 2 (1 p) 1 2 p 1 (1 p) 1 1 p (1 p) 2 Problem: What is the probability of flipping m heads in 3 trials (n=3) Enumerating all combinations (with 1 meaning a heads (success)) n combinations probability 3 111 1 (6) 3 (4) 2 11, 11, 11 3 (6) 2 (4) 1 1 1, 1, 1 3 (6) 1 (4) 2 1 (6) (4) 3 The right column is The number of combinations that result in n heads, times the probabilty of a heads (p) occuring n times, times the probability of a tails (1-p) occuring n-m times, where m is the number of flips. Problem: What is the probability of flipping m heads in n trials: This is a binomial distrubution with paramters n amd p: the number of trials (n) and the probability of a success for a given trial (p). f(m n, p)= n m p m (1 p) n m This is called the probability density function (pdf) for a binomial distribution. For example, if p = 6, the probability of flipping 3 heads in 1 trials is f(3)= 1 3 (6) 3 (4) 7 = 425 JSG 2 rev 8/2/11
What's kind of neat is the shape of the probabilty density function (p) becomes more 'bell shaped' as the number of trials increases: for i=1:11 m = i-1; f(i) = factorial(n) / ( factorial(m) * factorial(n-m)) * (6 ^ m) * (4 ^ (n-m) ); end sum(f) pdf for a binomial distribition with n=1 pdf for a binomial distribution with n = 1 In the limit as n goes to infinity, you get a continuous distribution called a 'normal distribution.' This is the Central Limit Theorem - coming up soon... JSG 3 rev 8/2/11
Pascal's Triangle A kind of neat pattern for the combination n m is as follows: Start with the number 1 (or 1 ) in row #1 Offset row #2 by 1/2 a digit. Add the numbers to the left and right of each spot in row #1 to generate row 2. Offset row #3 by 1/2 a digit. Add the numbers to the left and right of each spot in row #2 to generate row 3. The result is as follows: 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 1 1 5 1 1 6 15 2 15 6 1 Each row is the value of n as m goes from zero to n - which is one way of computing combinatorics. If you m shade in the odd entries, you get a pretty picture as well: (in the limit becoming Sierpinski Triangle) SierpinskiTriangle / Pascal's Triangle - source Wikipedia.com JSG 4 rev 8/2/11
Binomial Distributions with Multiple Outcomes: Problem: Two people are playing tennis. Person A has a 6% chance of winning a given point. The match is over when the first person wins 4 points (best of 7 series). What's the chance person A wins the match? Solution: Assume all 7 games are played. Person A wins if he/she wins 4, 5, 6, or 7 games. (The match will actually end once you get to 4 wins, but this covers all combinations thereof.) The probability of player A winning is likewise the sum of probability of these four outcomes. The probability of each case is: 4 wins: f(4)= 7 4 5 wins: f(5)= 7 5 6 wins f(6)= 7 6 7 wins: f(7)= 7 7 The total is 712. (6) 4 (4) 3 = 293 (6) 5 (4) 2 = 2613 (6) 6 (4) 1 = 136 (6) 7 (4) = 28 Person A has a 72% chance of winning the match. The 'better' player will win the match 72% of the time. An interesting question is how to set up a tournament so that the best player wins. Problem: Two people are playing tennis. Person A has a 6% chance of winning a given point. The match is over when the first person wins by 4 points. What's the chance person A wins the match? Solution: This is a totally different problem. If person A wins followed by a loss, you're back where you started. The net result is potentially an infinite series. JSG 5 rev 8/2/11
Hypergeometric Distribution: Problem: Suppose a box contains A white balls and B black balls. Each trial, you take one ball out of the bin and then put it back into the bin. Find the probability distribution function for drawing n white balls. Solution: This is sampling with replacement. Each trial has the same probability of success (drawing a white ball) p = A A+B The probability of n white balls is then a binomial distribution f(x n, p) = n x p x q n x Problem: Suppose you do not replace the ball after you select it. This changes the problem considerably. First, you cannot draw more white balls than there are balls in the bin and you can't draw more white balls than the total number of balls you draw: x min(n, A) You also cant draw less than zero balls: x max(, n B) The probability of drawing x balls in n draws is from the following: There are A ways of drawing x white balls x B There are ways of drawing the remaining n-x black balls n x The total number of ways you can draw n balls is A + B n So, the pdf for a Hypergeometric distribution is: f(x A, B, n) = A x B n x A+B n JSG 6 rev 8/2/11
Problem: A bin has 8 white balls and 1 black balls. You draw 5 balls without replacement. Find the probability that three of the balls are white: f(3 8, 1, 6) = 8 3 1 2 18 5 = (56)(45) 8568 = 2941 Problem: Find the probabiliy of drawing three or more white balls. This is the sum of the probability of drawing 3, 4, or 5 white balls. f(4 8, 1, 6) = 8 4 1 1 18 5 = (7)(1) 8568 = 817 f(5 8, 1, 6) = 8 5 1 18 5 The sum of these three is 3823. = (56)(1) 8568 = 65 JSG 7 rev 8/2/11
Sidelight: If you're curious, here's how to solve the problem of winning by 2 games. If you're not that curious or you find this confusing, don't worry about it. You won't cover this topic until you take an advanced statistics course on Martingales and Markov Chains. Let's define seven states: player A is up 4 games (player A wins) player A is up 3 games player A is up 2 games player A is up 1 games The match is even player B is up 1 games player B is up 2 games player B is up 3 games player B is up 4 games (player B wins) Initally, the match is even so p=1 in the 4th state: X()= x 1 x 2 x 3 x 4 x 5 x 6 x 7 = 1 After one game, there's a 6% chance you'll go to state 3 (player A is up a game) and 4% change you'll og to state 5 (player B is up a game). The same follows for the other states, except for the ends. If player A has won (state 1), you'll end up at state 1 1% of the time. or X 1 = X 1 = A X After 2 games, this becomes 1 6 6 4 6 4 6 4 6 4 4 1 X or X 2 = A X 1 In general: X 2 = A 2 X JSG 8 rev 8/2/11
X n = A n X In MATLAB, finding the value of A n for n=64 (almost infinity) results in -->A = zeros(7,7); -->A(1,1) = 1; -->A(1,2) = 6; -->A(2,3) = 6; -->A(3,4) = 6; -->A(4,5) = 6; -->A(5,6) = 6; -->A(3,2) = 4; -->A(4,3) = 4; -->A(5,4) = 4; -->A(6,5) = 4; -->A(7,6) = 4; -->A(7,7) = ; -->A 6 6 4 6 4 6 4 6 4 4 Note that the columns add up to one. There's a 1% chance you wind up somewhere after each game. Computing A 64 -->A2 = A*A -->A4 =A2*A2 -->A8 = A4*A4 -->A16 = A8*A8; -->A32 = A16 * A16; -->A64 = A32 * A32 95187 8796818 7713994 689963 3653917 45 136 12 136 24 6 181 136 91 136 2 6 45 481174 122956 2285628 399697 63458 The initial state is X: -->X = [,,,1,,,]' X = JSG 9 rev 8/2/11
After 64 games, the probability you're in a given state is: --->X64 = A64 * X X64 = 7713994 136 181 6 2285628 There's a 7713 chance player A will have won by the 64th game (state 1) 2285 chance player B will have won by the 64th game (state 7), and 377 chance the match is still on-going (the sum of state 2..6) JSG 1 rev 8/2/11