STAT 203 - Chapter 14 From Randomness to Probability This is the topic that started my love affair with statistics, although I should mention that we will only skim the surface of Probability. Let me tell you a little story about myself... So, what is the difference between chaos and randomness? At first, they seem the same, but... Both have outcomes that cannot be anticipated with certainty, although random events tend to settle down in the long run. * Talk about a coin toss. The idea of finding what happens in the long run is how we estimate the probability of certain events happening. Let s use the example of tossing a coin (something that we already know what should happen in the long run for). But let s pretend that we don t know how often Heads should come up (50% of the time). We can t predict what will happen on each toss. This is a random phenomenon. What is the chance, or Probability that the coin comes up Heads? Let s estimate this. We will preform 100 coin tosses (10 people, 10 times), and plot the Relative Frequency of Heads vs. the Number of Tosses, to demonstrate how in the long run, this approaches the True probability of tossing a Head. Note: In statistics, we do have methods of estimating how many times we should have to toss the coin, in order to get a good estimate of the long run relative frequency. 1
Round # of Heads Rel. Freq. of H ----- ---------- --------------- 1 /10 = 2 /20 = 3 /30 = 4 /40 = 5 /50 = 6 /60 = 7 /70 = 8 /80 = 9 /90 = 10 /100 = The more times we toss the coin, the more the Relative Frequency will stabalize around the value True value of 50%. In this example, we already knew the probability of tossing a Head, although we use the exact same idea to estimate the probabilities of things we do not know the actual probabilities for. We estimate the Probability of an event using it s long run Relative Frequency. Some Probability Concepts and Definitions: The Sample Space (S) is the set of all possible outcomes from a random phenomenon. (ie) Coin toss - S = {H, T}, Roll a Die - S = {1, 2, 3, 4, 5, 6}. An Event is a specific outcome, or a combination of outcomes from a random phenomenon, and we usually denote an event using capital letters (from the start of the alphabet) (ie) The event A = {Toss a H} C = {Roll a 6} D = {Roll an even number} E = {Draw a red card from a deck of cards} F = {Draw and Queen or higher from a deck of cards} The probability that event C occurs (prob. of rolling a 6), is written as P (C), although it is also acceptable to write it as P (Roll a 6) if this is clearer to you. 2
We call each toss of the coin, or roll of the die, etc., a Trial. Some Properties of P (A): For any event A; 1. 0 P (A) 1 (ie) Has somewhere between a 0% and 100% probability of happening. The larger P (A) is, the more likely the event is to occur. 2. The sum of the probabilities of all non-overlapping events in the sample space is 1 (100%). (ie) For a coin, P (H) + P (T ) = 1 For a die, P (1) + P (2) + P (3) + P (4) + P (5) + P (6) = 1. Previously, we did a series of coin tosses (or Trials). If the coin came up Heads on a toss, did that make the next toss more/less/equally likely to toss another Head? Trials are said to be Independent if the outcome of one trial does not affect the outcome of the other trials. *Mention roulette example 3
The Law of Large Numbers (LLN): Don t confuse this to mean numbers that are large. We are talking about doing a large number of trials. The Law of Large Numbers states that as the number of trials increases, the Relative Frequency gets closer to the True relative frequency. (ie) Toss a coin 10 times and get 4 Heads - 40% chance of H is our estimate Toss a coin 1000 times and get 480 Heads - 48% chance of H is our estimate. If we let (n) represent the number of trials, then as n increases: The absolute difference increases (off by 1 vs. off by 20), but the relative difference decreases (off by 10% vs off by 2%). You may have heard people talk about the law of averages, don t believe them. Some people confuse this law to believe that, somehow random phenomenon compensate for what has happened in the past. (ie) Tossed 8 Heads in a row, and in the long run H and T must balance out, therefore we are due for a bunch of T (???). not true. So, we will now officially designate the long-run relative frequency of an event as (our estimate) of the Probability that event occurs. Notes: 1. Sometimes outcomes are not all equally likely. (ie) for rolling a die, each number is equally likely, so the probability of any outcome is just 1 divided by the number of outcomes (roll a 4, with 1)... 6 but, you either win or lose the lottery, although the probability of winning is not 1!!! 2 2. We define the probability as the long run relative frequency, although sometimes this does not exist. (ie) the probability that you get an A in this course (you will not take it repeatedly...). We call these estimates from ones own experiences a Personal Probability. We only deal with formally defined probabilities. 4
Some Probability Rules for Events: It is simplest to work with the probabilities of individual outcomes ((ie) roll a 6), although the events we are often interested in consists of a combination of outcomes ((ie) roll a 3 or higher). Also, we are often interested in the occurrence/non-occurrence of more than one event. (ie) Get a heart or (maybe and instead of or?) a Jack on a randomly selected card from a deck of cards. In light of this, we will define a few rules for the probabilities of events. To aid in our discussion let s call the events... A = {Toss a H} B = {Toss a T} C = {Roll a 6} D = {Roll an even number} E = {Draw a red card from a deck of cards} F = {Draw and Ace from a deck of cards} G = {Draw the A from a deck of cards} First, let s define some terminology and notations. For the events A and B... A or B (also written A B, or A union B) means that either event A or event B or both events A and B occur. (ie) (C or D), is when we roll a 6 or an even number, or an even number that is a 6. A and B (also written A B, or A intersect B) means that the events A and B occur simultaneously. (ie) (E and F), is that we draw a red card from a deck and we draw an Ace from the deck. (we get the A or A ) The Compliment of an Event A, is the event that A does not occur, and is written (A C ) 5
(ie) C C is the event that we do not roll a 6. (ie) F C is the event that we do not draw an Ace from the deck of cards. We say that events are Disjoint if only one of them can occur at a time. In other words, if the occurrence of one prevents the occurrence of another. Non-overlapping is a good word to keep in mind here. (ie) For a coin, H and T are Disjoint. If we get a H, then we can t possibly get a T (for the same trial). (ie) For a die; 1, 2, 3, 4, 5, 6 are Disjoint events, as if we roll a 2, then we cannot possibly have any of the other outcomes. If the events A and B are disjoint, then P (A and B) = 0 People also use the word mutually exclusive in place of disjoint. We say that the events A and B are Independent, if the occurrence/non-occurrence of one (A) does not effect the occurrence/non-occurrence of the other (B). (ie) The events B and C from above are independent, but the events E and G are not independent As mentioned earlier, the sample space (S) is the set of all possible outcomes of a trial, therefore P (S) = 1. (ie) one of the possible outcomes has to happen. Venn-Diagrams of the above: 6
Probability Rules: Compliment Rule: P (A) = 1 - P (A c ) (ie) For event C above, P (C C ) = P (Don t roll a 6) = 1 - P (C) = 1-1 6 = 5 6. This rule can save a lot of time/calculation. (ie) in 1000 coin tosses, what is the probability that I get 3 or more Heads...(long time). Why not use the compliment rule. Addition Rule: If the events A and B are Disjoint, then the probability of (A or B), is...(later, we will present the rule for when they are not disjoint events) P (A or B) = P (A) + P (B) (ie) Let s call event A = {Roll a 3}, and B = {Roll an even number}, then A and B are disjoint. P (A or B) = P (A) + P (B) = 1 6 + 3 6 = 4 6. What is the probability of drawing a Jack or higher from a deck of cards? Multiplication Rule: If the events A and B are Independent, then the probability of (A and B), is...(later, we will present some rules for when they are not independent) P (A and B) = P (A) P (B) (ie) From events above, P (A and C) = P (A) P (C) = 1 2 1 6 = 1 12.. What is the probability you toss 4 heads in a row?. What is the probability you roll snake eyes in craps? 7
Examples: 1. Consider the events listed below, and fill in the following table. All of these pertain to a single trial. A = {Toss a H} B = {Toss a T} C = {Roll a 3 or higher} D = {Roll an even number} E = {Roll a 1} F = {Draw a black card} G = {Draw a } H = {Draw the Ace } I = {Draw a card lower than 7} Events Disjoint? Independent? ------ --------- ------------ A, B A, C C, D C, E D, E F, G F, H F, I G, H G, I H, I What is P (A or B)? What is P (A and B)? What is P (A and C)? What is P (F or H)? What is P (E and H)? 8
2. If you are not too familiar with gambling on the spreads, don t worry because I will explain this example more in class. The government runs many lottery-type games, one being sports gambling (I think it s called pro-picks in BC). You must pick a minimum of 3 games to bet on the spreads and win them all in order to win $. For a three game parlay, they pay 5 dollars for every dollar bet. Is this worth it? (ie) What is the probability of winning three games when betting on the spreads? 3. What is the probability that you play the lotto game pick 3 and win? 4. A used car lot has cars from 5 different countries. 35% of their cars are from the U.S.A, 30% are from Japan, 20% are from Germany, 10% from France and 5% from Korea. (a) What is the probability that a car selected at random from the lot is from Japan or Korea? (b) What is the probability that a car selected at random is not from the U.S.A.? (c) What is the probability that a car selected at random is not from Germany or France? 5. This class is 75% females and 25% males. We also have 7% of students in their first year, 31% in their second year, 25% in their third year and 37% in their fourth or more year. If we select a student at random, what is the probability of the events below. We will assume that year level and gender are independent. (a) They are not in their fourth or more year? (b) They are female and in their second year? (c) They are male and in their second or less year? (d) Note that we assumed that the gender and year level are independent. Do you think this is a reasonable assumption? 9