Assignment 4: Permutations and Combinations

Similar documents
Name Class Date. Introducing Probability Distributions

Fundamentals of Probability

CSE 312 Midterm Exam May 7, 2014

CS 237 Fall 2018, Homework SOLUTION

4.1 Sample Spaces and Events

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Compound Probability. Set Theory. Basic Definitions

Independence Is The Word

Section 6.1 #16. Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Poker Hands. Christopher Hayes

Intermediate Math Circles November 1, 2017 Probability I

More Probability: Poker Hands and some issues in Counting

Week 1: Probability models and counting

Date. Probability. Chapter

A Probability Work Sheet

CIS 2033 Lecture 6, Spring 2017

3 The multiplication rule/miscellaneous counting problems

Probability. The Bag Model

1. An office building contains 27 floors and has 37 offices on each floor. How many offices are in the building?

Math 447 Test 1 February 25, Spring 2016

Math 4610, Problems to be Worked in Class

Such a description is the basis for a probability model. Here is the basic vocabulary we use.

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 13

Probability. The MEnTe Program Math Enrichment through Technology. Title V East Los Angeles College

Independent Events. 1. Given that the second baby is a girl, what is the. e.g. 2 The probability of bearing a boy baby is 2

UNIT 4 APPLICATIONS OF PROBABILITY Lesson 1: Events. Instruction. Guided Practice Example 1

Laboratory 1: Uncertainty Analysis

Simulations. 1 The Concept

Probability Paradoxes

7.1 Experiments, Sample Spaces, and Events

Fall 2017 March 13, Written Homework 4

Section : Combinations and Permutations

MATH 215 DISCRETE MATHEMATICS INSTRUCTOR: P. WENG

The point value of each problem is in the left-hand margin. You must show your work to receive any credit, except on problems 1 & 2. Work neatly.


Test 2 SOLUTIONS (Chapters 5 7)

Normal Distribution Lecture Notes Continued

Chapter 11: Probability and Counting Techniques

3 The multiplication rule/miscellaneous counting problems

Chapter 5: Probability: What are the Chances? Section 5.2 Probability Rules

Venn Diagram Problems

The probability set-up

CHAPTERS 14 & 15 PROBABILITY STAT 203

Chapter 2. Permutations and Combinations

APPENDIX 2.3: RULES OF PROBABILITY

7.1 Chance Surprises, 7.2 Predicting the Future in an Uncertain World, 7.4 Down for the Count

Poker: Probabilities of the Various Hands

AP Statistics Ch In-Class Practice (Probability)

MATH STUDENT BOOK. 7th Grade Unit 6

Poker: Further Issues in Probability. Poker I 1/29

Empirical (or statistical) probability) is based on. The empirical probability of an event E is the frequency of event E.

1. The chance of getting a flush in a 5-card poker hand is about 2 in 1000.

Statistics Intermediate Probability

Important Distributions 7/17/2006

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

a) Getting 10 +/- 2 head in 20 tosses is the same probability as getting +/- heads in 320 tosses

Probability (Devore Chapter Two)

Activity 1: Play comparison games involving fractions, decimals and/or integers.

19.3 Combinations and Probability

Grade 7/8 Math Circles February 25/26, Probability

Poker: Probabilities of the Various Hands

Moore, IPS 6e Chapter 05

Math 147 Lecture Notes: Lecture 21

November 6, Chapter 8: Probability: The Mathematics of Chance

NAME DATE PERIOD. Study Guide and Intervention

Probability and Counting Techniques

2 Event is equally likely to occur or not occur. When all outcomes are equally likely, the theoretical probability that an event A will occur is:

1. For which of the following sets does the mean equal the median?

Section 5.4 Permutations and Combinations

CS1800: Intro to Probability. Professor Kevin Gold

ECON 214 Elements of Statistics for Economists

Hypergeometric Probability Distribution

The probability set-up

Math 14 Lecture Notes Ch. 3.3

Math 1313 Section 6.2 Definition of Probability

Lecture Start

Section 5.4 Permutations and Combinations

Unit Nine Precalculus Practice Test Probability & Statistics. Name: Period: Date: NON-CALCULATOR SECTION

M146 - Chapter 5 Handouts. Chapter 5

Spring 2016 Math 54 Test #2 Name: Write your work neatly. You may use TI calculator and formula sheet. Total points: 103

CMPSCI 240: Reasoning Under Uncertainty First Midterm Exam

WSMA Compound Probability Lesson 10. The combined likelihood of multiple events is called compound probability.

Math 1111 Math Exam Study Guide

The study of probability is concerned with the likelihood of events occurring. Many situations can be analyzed using a simplified model of probability

If a fair coin is tossed 10 times, what will we see? 24.61% 20.51% 20.51% 11.72% 11.72% 4.39% 4.39% 0.98% 0.98% 0.098% 0.098%

Grades 7 & 8, Math Circles 27/28 February, 1 March, Mathematical Magic

Math 1111 Math Exam Study Guide

Section 7.1 Experiments, Sample Spaces, and Events

Section The Multiplication Principle and Permutations

1. More on Binomial Distributions

1 2-step and other basic conditional probability problems

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

BAYESIAN STATISTICAL CONCEPTS

November 11, Chapter 8: Probability: The Mathematics of Chance

LAMC Junior Circle February 3, Oleg Gleizer. Warm-up

A Mathematical Analysis of Oregon Lottery Win for Life

Key Concepts. Theoretical Probability. Terminology. Lesson 11-1

November 8, Chapter 8: Probability: The Mathematics of Chance

Textbook: pp Chapter 2: Probability Concepts and Applications

Transcription:

Assignment 4: Permutations and Combinations CS244-Randomness and Computation Assigned February 18 Due February 27 March 10, 2015 Note: Python doesn t have a nice built-in function to compute binomial coeffiecients, and surprisingly, matplotlib does not seem to either. If you want to take the trouble you can download the scipy package (same site as numpy) and then type import scipy.special scipy.special.binom(n,k) It is probably simpler just to use the following function for computing these coefficients: def combinations(n,k): prod = 1.0 for j in range(k): prod = prod*(n-j)/(j+1) return prod I will post this code on the website. I can t seem to leave the birthday stuff alone, and I had to restrain myself from making every problem about birthdays. 1. Use the exponential approximation to estimate how many people need to be present in order for the probability of a coincidental birthday to be greater than 0.9, 0.95, 0.99. (So there are three answers here.) Then answer the questions 1

again using the exact probabilities you may want to write jut a little bit of Python code. Compare the two results they should be quite close. Solution. We approximate the probability of no coincidental birthday in a group of k people by e k2 /2N, where N = 365. We thus have to solve the equations of the form e k2 /730 = a, where a = 0.1, 0.05, 0.01. Taking logs of both sides and simplifying a little gives k = 730 ln 1 a. Substituting 10,20, and 100 for 1 a gives the solutions: 40.998, 46.74, 57.98. Let s round these up to 41, 47, 58. To check the answer against the exact probabilities, we use the following code: >>> def birthday_coincidence(numpeople): j=1 for k in range(1,numpeople): j *= 1.0*(365-k)/365 return 1-j >>> birthday_coincidence(40) 0.891231809817949 >>> birthday_coincidence(41) 0.9031516114817354 >>> birthday_coincidence(46) 0.9482528433672548 >>> birthday_coincidence(47) 0.9547744028332994 >>> birthday_coincidence(57) 0.9901224593411699 >>> birthday_coincidence(58) 0.9916649793892612 2

In the first two cases, our approximation gave the best possible answer. In the third case it was off by one (57 would have been a better answer). 2. Suppose you have a database of biographies of prominent people from the past. Each biography contains a date of birth and a date of death. If there are 1000 records in the database, what is the probability that two of them share both a date of birth and a date of death (we are ignoring the year of birth and the year of death, and just looking at the month and the day)? You should use the exponential estimate for the generalized birthday problem. Solution. It s just the birthday problem with k = 1000 and N = 365 2. The probability at least one shared birthday-death day pair is approximately 1 e 10002 /2 365 2 = 0.98168. It s almost a sure thing. In fact, with only 100 people, the probability of a shared pair is already well over 50%. 3. You walk into a room with k people. What is the probability that someone in the room has the same birthday as you? (Observe that this is very different from the question we asked earlier, about whether there is any pair of people in the room with the same birthday.) Express this exactly, and then approximate it using the exponential approximation 1 x e x for small positive x. How many people need to be in the room for the probability to exceed one-half? Solution. The probability that a randomly chosen person has a different birthday from me is 1 1. The probability that k people have different birthdays from 365 me is (1 1 365 )k. With the exponential approximation, this is about e k/365. Let s find the value of k that makes this one-half: We take logs and reciprocals and get k 365 ln 2 = 252.998. An exact calculation shows so our approximation was very accurate. (1 1 365 )253 = 0.4995, 3

4. If I asked you to compute the probabilities of various poker hands, it would take you less than a millisecond to find the Wikipedia page Poker odds with all the answers, complete with the number of relevant outcomes for each hand expressed in terms of binomial coefficients. So I had to make up some new poker hands and ask you their probabilities. Explain your reasoning carefully, and try to express your answers both in terms of binomial coefficients and powers, and as numerical values. It s easy to be led astray here, and a very good way to check your answer is to write a simulation. You are not required to do this for the homework, but it s not a bad idea if you want to see if you were right. (a) The picture cards are the three ranks Jack, Queen, King. What is the probability of getting all picture cards? Solution. There are 12 picture cards, so the total number of 5-card hands containing only picture cards is ( ) ( 12 5. The desired probability is thus 12 ) ( 5 / 52 ) 5 = 0.0003047, which is quite a lot smaller than I would have guessed! (b) Two of the suits contain black cards, and two of the suits contain red cards. What is the probability of having all 5 cards be the same color? Solution. There are 26 red cards and 26 black cards. We can proceed as in (a) to compute the probability of getting all red cards. The desired probability is twice this value. So the answer is: ( ) ( ) 26 52 2 / = 0.0506. 5 5 (c) What is the probability of having all five cards belong to exactly two of the suits? Remember there are two ways this can split: 3 of one suit and 2 of the other, or 4 of one suit and 1 of the other. For a 3-2 split, there are 4 ways to choose the 3-suit, and then 3 ways to choose the 2-suit. Once the 3-suit is chosen, there are ( ) 13 3 ways to choose 3 cards from it, and also ( ) 13 2 ways to choose 2 cards from the 2-suit. So the number of distinct hands in which there are three cards from one suit and two from the other is ( ) ( ) 13 13 4 3. 3 2 By essentially identical reasoning, the number of distinct hands in which there are 4

four cards from one suit and one from another suit is ( ) ( ) 13 13 4 3. 4 1 Put it all together and the total number of hands in question is ( ) ( ) ( ) ( ) 13 13 13 13 4 3 + 4 3 = 379236. 3 2 4 1 So the desired probability is ( ) 52 379236/ = 0.1459. 5 Here is another way to get the same result it s hard to say if this is simpler or not. First choose our two suits: there are ( 4 2) = 6 ways to do this. Then choose 5 cards from the 26 cards in the 2 suits. This gives ( ) 26 6 = 394680. 5 The problem is that in this tabulation, we have also counted the hands that consist of cards from a single suit, and moreover, we have counted each of these hands several times. For example, there are ( ) 13 5 hands consisting entirely of hearts, and in our tabulation, each of these has been counted three times (assuming our two suits are hearts-spades, hearts-clubs, hearts-diamonds). That means we must subtract ( ) 13 4 3 = 15444 5 from our total. And, what do you know, 394680 15444 = 379236. It always feels good when two different methods give the same answer! 5. There are two candidates in an election. Candidate A has received 55% of the votes, candidate B 45%. There is a very large number of voters (several million, let s say). We randomly sample 100 voters. This is sampling without replacement, since we should not poll the same voter twice!, but the voter pool is so large 5

that you can treat it as a problem of sampling with replacement, which makes the calculation somewhat easier. What is the probability that in the sample, candidate B receives more votes? Express this answer as a formula using the binomial coefficients, and then compute the probability exactly. HINT: Think of the underlying experiment as flipping 100 biased coins in succession. We saw how to express the probability of getting exactly k heads in terms of binomial coefficients, so here you will have a sum of about 50 such probabilities. You will thus need to write a little code to answer the question. Solution. Just as a reality check, we would expect this answer to be less than one-half, because candidate A received more votes overall. By the coin analogy, the probability that candidate B receives exactly k votes is ( ) 100 0.45 k 0.55 100 k. k Thus the probability that candidate B receives strictly more votes than candidate A is the sum of all these values as k varies from 51 to 100. A quick computation with Python gives 100 k=51 ( ) 100 0.45 k 0.55 100 k = 0.1346. k This shows you something about the effectiveness of polling if we have a truly representative sample, and a 55-45 margin, then we can predict the result of the election correctly 87% of the time by sampling only 100 people. With a sample of 200 people, the success rate rises to 93%. 6. (Real birthdays) This is the most involved problem in terms of programming, although not all that deep in terms of math. One very useful part of the problem concerns how to sample from a given nonuniform distribution. I am going to give you actual data on birth dates in the United States from one year. You are to simulate the birthday problem using this distribution, and then superimpose a plot of the result on the one obtained from exact calculation using the uniform distribution model. The birth data for 1978 is posted on the course website. I found this at the Chance website from Dartmouth, which also has the Grinstead-Snell book, but I don t know the original source for the data. You will want to read the second column into a Python list. If you ve forgotten (or never knew) how to do this, you can use the following code (of course you have to change the full path name for the file.) 6

Figure 1: Distribution of US birthdays in 1978 infile = open( /Users/straubin/teaching/244/244website/birthday.txt, r ) bdaylist = [] for j in range(365): s=infile.readline().split() bdaylist.append(int(s[1])) (a) Make a scatter plot or a stem plot of the data. You can see the nonuniformity very clearly: it is somewhat exaggerated if you display the plot with the default settings, so I suggest that you base the y-coordinates at 0, using xlim(0,11000). I find the results astonishing. There is indeed a seasonal variation (explained by what? planning for the optimal time for the baby to be born? seasonal variation in sexual behavior? in fertility?) but the amazing thing is that it looks as if there are two entirely separate data series, with roughly the same seasonal variation, but one significantly lower than the other. (Speculation about the variation is not part of the assignment, just some random musing.) 7

Solution The code to produce the plot is posted on the website, and the plot is in Figure 1. There definitely is a seasonal variation: the plot shows a spike in births around the end of October and at the very end of the year, and a low in mid-april. The October and April births point to more people conceiving around Christmastime and fewer at the height of summer (I m not sure why). I wonder if the year-end spike is there for tax purposes that shows some very careful planning! But the real surprise is that the data falls into two series, and the differences between these two series is larger than the variation within the series. What could explain this extraordinary non-seasonal variation? If you study the data carefully, you ll realize that the lower series largely consists of two days out of every week. This is probably due to scheduled Caesarean sections, which constitute a very significant fraction of total births in the United States the hospitals don t like to schedule these for weekends. (b) You are to plot, for k = 1 to about 65, the probability of coincidental birthdays using this probability distribution. The first hurdle is that you need some way of generating random birthdays based on this probability model. You can code this by hand using rand(), but there is a built-in method in numpy. You have to add to your program import numpy.random as npr You then use a function called choice. To see how this works, a call to npr.choice([1,2,3,4,5,6],p=[0.3, 0.3, 0.2, 0.1,0.05,0.05]) will generate a value in {1, 2, 3, 4, 5, 6} distributed according the probability mass function p: that is, 1 will occur with probability 0.3, 2 with probability 0.3, etc. Use this function and the information read in from the file to randomly generate birthdays according to the given distribution. The second hurdle is efficiently performing the simulation. You can do this any way you like, but there is a nice trick for speeding things up, based on the following insight: Consider the experiment where we repeatedly sample people from the population until we find a birthday that is the same as one we have already drawn, and look at the the random variable that gives the number of rounds this experiment lasts. Our original plot of birthday probabilities is just the cumulative distribution function of this random variable assuming uniformly distributed birthdays. This means that we can repeatedly sample birthdays with the choice function, and make a cumulative histogram. You have to scale things so that the values rise from 0 to 1, and use the value returned by the histogram function to produce 8

a line plot. (c) Add to your program the code that generates the plot for the probability of coincidental birthdays under the uniform distribution, and superimpose the two. Do you see much difference between the two plots? How adequately does the uniform probability distribution model the real-life version of this problem? Solution. The code to produce the plot is posted on the website, and the plot itself is shown in Figure 2. It performs 10,000 trials of the experiment of repeatedly sampling birthdays until a duplicate birthday is found, and recording the number of samples. It then plots a cumulative histogram of the result (as a line graph, not a bar graph). This is superimposed on a calculation of the exact probabilities assuming 365 equally likely birthdays. There is only a very slight difference between the two plots. So for purposes of the repeated-birthday problem, the simple uniformity model gives accurate results, in spite of the nonuniformity present in the real-life data. 9

Figure 2: Probability of a repeated birthday, showing the results of a simulation based on the 1978 data, and the exact probabilities assuming uniform distribution of birthdays 10