Chapter 1: Introduction to Statistics

Similar documents
Going back to the definition of Biostatistics. Organizing and Presenting Data. Learning Objectives. Nominal Data 10/10/2016. Tabulation and Graphs

3. Data and sampling. Plan for today

Spiral Review Created by K. Lyle 2014

MAT 1272 STATISTICS LESSON STATISTICS AND TYPES OF STATISTICS

Data 1 Assessment Calculator allowed for all questions

Chapter Test Form A. mean median mode. 187 Holt Algebra 1. Name Date Class. Select the best answer.

Filling out a form quiz

MAT Midterm Review

San Joaquin County First Families Certificate Program

AP Statistics S A M P L I N G C H A P 11

Compute P(X 4) = Chapter 8 Homework Problems Compiled by Joe Kahlig

Name: Period: Date: 7 th Pre-AP: Probability Review and Mini-Review for Exam

Introduction. Descriptive Statistics. Problem Solving. Inferential Statistics. Chapter1 Slides. Maurice Geraghty

Date. Probability. Chapter

AP Statistics Ch In-Class Practice (Probability)

Name: Spring P. Walston/A. Moore. Topic worksheet # assigned #completed Teacher s Signature Tree Diagrams FCP

Methods and Techniques Used for Statistical Investigation

Lesson 13: Populations, Samples, and Generalizing from a Sample to a Population

WRITING ABOUT THE DATA

Statistics. Graphing Statistics & Data. What is Data?. Data is organized information. It can be numbers, words, measurements,

Student-Built Glossary

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

10 Wyner Statistics Fall 2013

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

Searching, Exporting, Cleaning, & Graphing US Census Data Kelly Clonts Presentation for UC Berkeley, D-lab March 9, 2015

, x {1, 2, k}, where k > 0. (a) Write down P(X = 2). (1) (b) Show that k = 3. (4) Find E(X). (2) (Total 7 marks)

Chapter 12: Sampling

First / Second Class ~ Lesson 3

Name Date Class. Identify the sample space and the outcome shown for each experiment. 1. spinning a spinner

Taming the FamilySearch Goliath

Pre-Calculus Multiple Choice Questions - Chapter S14

FSA 7 th Grade Math. MAFS.7.SP.1.1 & MAFS.7.SP.1.2 Level 2. MAFS.7.SP.1.1 & MAFS.7.SP.1.2 Level 2. MAFS.7.SP.1.1 & MAFS.7.SP.1.

Polls, such as this last example are known as sample surveys.

Chapter 3: PROBABILITY

Go Math. Common Core. 1 st Grade. Daily Spiral Review. Chapter 10. Represent Data. By Donna Walker

Math 247: Continuous Random Variables: The Uniform Distribution (Section 6.1) and The Normal Distribution (Section 6.2)

Business Statistics:

Statistical Measures

Maiden Names: Unlocking the mystery of the Mrs. Jim Lawson Professional Genealogist

2. The value of the middle term in a ranked data set is called: A) the mean B) the standard deviation C) the mode D) the median

AWM 11 UNIT 1 WORKING WITH GRAPHS

THE NORTH LONDON INDEPENDENT GIRLS SCHOOLS CONSORTIUM MATHEMATICS

First Families of Ashland County

2. Let E and F be two events of the same sample space. If P (E) =.55, P (F ) =.70, and

Statistical Methods in Computer Science

Sampling Terminology. all possible entities (known or unknown) of a group being studied. MKT 450. MARKETING TOOLS Buyer Behavior and Market Analysis

Use U.S. Census Information to Resolve Family History Research Problems

Essentials. Week by. Week. Calculate!

Chapter 7 Homework Problems. 1. If a carefully made die is rolled once, it is reasonable to assign probability 1/6 to each of the six faces.

MAT.HS.PT.4.CANSB.A.051

CHAPTER 1 Exploring Data

Paper 1. Mathematics test. Calculator not allowed. First name. Last name. School. Pupil number KEY STAGE TIER

Incoming Advanced Grade 7

We Don't Have To Go To the Courthouse Do We? by Mary Lou Bevers

Math 1342 Exam 2 Review

Methodology Statement: 2011 Australian Census Demographic Variables

Moore, IPS 6e Chapter 05

10.1 Applying the Counting Principle and Permutations (helps you count up the number of possibilities!)

Using Technology to Conduct a Simulation. ESSENTIAL QUESTION How can you use technology simulations to estimate probabilities?

Mrs. Mary Abel (Dr. Herman Abel)

Math 141 Exam 3 Review with Key. 1. P(E)=0.5, P(F)=0.6 P(E F)=0.9 Find ) b) P( E F ) c) P( E F )

Estimated Population of Ireland in the 19 th Century. Frank O Donovan. August 2017

Census Response Rate, 1970 to 1990, and Projected Response Rate in 2000

One of the most popular paper filling systems was developed by Mary E. Vassel Hill. This is the filling system we are going to talk about today.

SAMPLING DISTRIBUTION MODELS TODAY YOU WILL NEED: PENCIL SCRATCH PAPER A PARTNER (YOUR CHOICE) ONE THUMBTACK PER GROUP Z-SCORE CHART

How Do I Start My Family History?

Essentials. Week by. Week

Individual Guess Actual Error

NISRA Merged Report. Area Profile Report. Created Friday, July 04, :54 PM. Page 1

Summer of Sleuthing Saving Our Stories

University of California, Berkeley, Statistics 20, Lecture 1. Michael Lugo, Fall Exam 2. November 3, 2010, 10:10 am - 11:00 am

Discovery Activity: Slope

Core Learning Standards for Mathematics Grade 6

Lesson Sampling Distribution of Differences of Two Proportions

Problem 1 (15 points: Graded by Shahin) Recall the network structure of our in-class trading experiment shown in Figure 1

Module 5: Probability and Randomness Practice exercises

Learning about line graphs

This page intentionally left blank

March 10, Monday, March 10th. 1. Bell Work: Week #5 OAA. 2. Vocabulary: Sampling Ch. 9-1 MB pg Notes/Examples: Sampling Ch.

Overview of the Course Population Size

Guidelines for Completion of a Youth Application

9. If 35% of all people have blue eyes, what is the probability that out of 4 randomly selected people, only 1 person has blue eyes?

Chapter 0: Preparing for Advanced Algebra

11-1 Practice. Designing a Study

Probability and Counting Techniques

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Albuquerque, UNM

Mrs. Daniel- AP Stats Chapter 6 MC Practice

Name: Class: Date: Ver: 2

Ratios, Rates & Proportions

Slide 1 / 130. Ratios, Rates & Proportions

Chapter 1: Economic and Social Indicators Comparison of BRICS Countries Chapter 2: General Chapter 3: Population

Population and dwellings Number of people counted Total population

The Human Calculator: (Whole class activity)

SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES American Community Survey 5-Year Estimates

13 Reasons You Can t Break Down Your Brick Wall and Find the Family History Information You Need. 5 April 2018

Population and dwellings Number of people counted Total population

Class 10: Sampling and Surveys (Text: Section 3.2)

Core Connections, Course 2 Checkpoint Materials

MATH 2000 TEST PRACTICE 2

Mutually Exclusive Events

Transcription:

Section 1 1: Descriptive Statistics: Chapter 1: Introduction to Statistics The first 3 chapters of this course will develop the concepts involved with Descriptive Statistics. Descriptive Statistics is the act of collecting, organizing, displaying and summarizing information. Collecting Information: Statistics starts with the collecting of information. The act of collecting information can be as simple as recording the age of every student in your current statistics class. It can be far more complicated than that. If you wanted to find out the age of every student at Folsom Lake College for the Spring 2011 semester several problems must be considered. Should you count every student who is enrolled on the first day of the fall semester? Should you count the students who enroll during the first weeks of class? What about students who drop in the first few weeks or never show up? You must decide on how to clearly describe the people you are going to include. Collecting data begins with a clear precise description of the information you want to collect. The actual collecting of the data can also pose a problem. Will the college make the data available to you? If not, how will you go about collecting the data? Even if you can and do collect the students ages, can you trust the information? People can and do give information that is untrue. Not everyone lists their age truthfully. Information can be recorded incorrectly or hard to read. For example, a student writes down their age as 78. Was the student honest in declaring their age? Was the age of 78 a misprint. The seven in the 78 could be correct but it also could have been an 18 year old whose age was written poorly and looks like a seven. The act of collecting information is really an art. It is very important to be very specific in the description of who, what, when or where the information describes. It is also very important to collect and record the information in a manner that helps ensure that the information reflects an honest record. Verifying the accuracy of the data as it is collected and recorded can be a major cost in the process. The methods of collecting and verifying information are not within the scope of this course. The student is asked to consider that all information given in this course has been collected and verified with methods that are consistent with accepted practices. Section 1 1 Lecture Page 1 of 10 2012 Eitel

An Example of Problems in Collecting Information: State, county and local church records in Texas contain records of a Joe Hardy, a Joseph Hardy, a Joe E. Hardy and a J. Edward Hardy all born in 1891. These same records show the birth of a Julia Baskin, born in 1894, a Julia Baskin born in 1893, and a Elisa Julia Baskin born in 1894. The various local records also show several combinations of these people marrying each other in either 1919 or 1920. The original records were handwritten. They were later stored on micro film and now exist in several digital formats. This information has been kept for all these years as an official record of two people who lived, married, gave birth and died in the towns of Oklahoma and northern Texas. 1. Is the information that was collected, stored and protected all these years correct? 2. The original collection of a piece of information is called the first generation record. Each time that the same information is rewritten, copied, or compiled along with other information starts a new generation for that piece of information. Does information become more actuate as the generations increase? 3. What would a reasonable person do with the information mentioned above if they were compiling their own record of this information? 3. If someone believes that there were only 2 people reflected in all these documents, how should they correct the information? Should they edit the original first generation documents? Should they create a new generation of records with a record of the birth, marriage and death using the dates they think are correct? Organizing Information: Many methods exist to organize Information. A common way to organize information is to place the information into a list. The list may be put into a table format. Putting the data into a table in an Excel spreadsheet allows the list to be sorted by alphabetic or numerical means to increase the usefulness of the data. Sorting data after it has been put into a table may be the most common way to organize data today. Sorting data into groups based on common attributes is very useful. Sorting data into groups based on the month of the year may help you see trends over time. Sorting data into groups based on the state of birth may help you see trends based on geographic location. Sorting data into groups based year of birth may help you see trends based on age. Database programs like Access or Oracle allow for more complicated searches of the information. These programs allow you to search for all of the information that has several common attributes. These programs also allow the user to control the way the results of the search is displayed. Database programs like these are becoming the standard for the storage and display of large amounts of information. Section 1 1 Lecture Page 2 of 10 2012 Eitel

Summarizing Information It is very common to view small amounts of data in a sorted list. The use of a sorted list becomes a problem when there is a large amount of data. If also common for large data sets to have many repeating values. When large data sets with repeating values occurs, it is common to summarize the data into two types of tables. A Frequency Table takes all the repeats of a value and lists each different value once and records how often (how frequent) each of the different values occurs. A Frequency Table x = number of cars owned by your family Frequency of x 0 1 2 3 4 1 2 7 6 4 When the data contains a large number of different values then collecting the data into groups or classes based on a range of values allows the data to be viewed in a smaller table that summarizes the information. This type of table is called a Class Frequency Table. A Class Frequency Table x: age of child Frequency of x 0 4 5 9 10 14 15 19 20 24 2 8 5 10 5 Section 1 1 Lecture Page 3 of 10 2012 Eitel

Displaying Information After data has been put in a table and grouped into a summery table it is common to use a graph to provide a visual view of the data. The most common graphs used are bar graphs, line graphs or pie charts. Excel and other software programs provide these three graphs as well as a wide variety of other types of graphs. Each type of graph provides a different way to view data. Bar Graph Line Graph Freq (x) 40 35 30 25 20 15 10 5 0 20 22 27 32 37 4244 NFL age at retirement in years 300 students Circle Graph 100 students 200 students Favorite Color between Red, Blue and Green based on 600 students Section 1 1 Lecture Page 4 of 10 2012 Eitel

Population A Population is the entire collection of things or people that will be studied. A populations can consist of a very few members or have a very large number of members. If you are recording the height of every person who played in the NBA this season then that population would be much larger then if you were recording the height of every member of your family. The key idea is that the population must include every member that meets the description of that population. Sample A Sample is a collection that consists of only part of the entire population. If the population is defined to be every person in this statistics class then a sample of that population could consist of only the students that sit in the front row. If the population is defined to be every person who was in the school play then a sample of that population could consist of only the people whose names begin with the letter A. Each set is a sample because each set represents only part of the entire population. Population versus Sample A Population is the entire collection of things that will be studied. A Sample is part of an entire population. Population (P) or Sample (S) Examples Classify the following information as being found from a population (P) or Sample (S). A) I tested 2 students in a class of 30 and they were both nearsighted. B) The Major League Baseball Teams in California are, Oakland Athletics., San Diego Padres, the San Francisco Giants, the Los Angeles Dodgers and the Los Angeles Angels. C) I selected Mary and Bob out of the 40 students in my math class to grade their homework. D) Sue selected 16 of the 50 people at the Bookstore to see how long it took then to buy their books. Answers: A) S B) P C) S D) S Section 1 1 Lecture Page 5 of 10 2012 Eitel

Census A Census is a collection of information about a characteristic that was collected from the entire population. When we collect information about every member of the population the information collected is called a census. If the population is defined to be every person in a math class that took Test One then a census of that population would be all the actual test scores for every student in the class that took Test One. The same population could have a census taken of the time it took each student to complete the test. The same population produced two different sets of data and each set of data was a census of the entire population of class members. Examples Yes (Y) or No (N). Was a Census used to find the following information. A) Sue reported that all 3 of her children used contacts. B) Sam asked 25 of the 40 students in his class if they are left handed. C) Mary observed that every person enrolled in her class had met the prerequisite. D) John watched 155 people out of the 4300 people at the Mall and recorded how long it took each of the 155 people to find a parking place. Answers: A) N B) N C) Y D) N It is very hard to take a real Census Every 10 years the government takes what it calls a population census to determine the number of people that live in the country on a state by state basis. Is this a real census? There are several reasons that this is not an actual census. It takes so long to collect the data that people being born and people dying cannot be correctly accounted for. People move in and out of the country, as well as from state to state during the census process so that data never includes all the population. To be a Census every member of the population must have itʼs information collected. You cannot skip a few people. This means that many collections of data that claim to be a census are really a sample of the entire population and not a census. Section 1 1 Lecture Page 6 of 10 2012 Eitel

Parameter (Numerical Measurements of a Population) A Parameter is a measurement describing a numerical property about a population. There are three numerical Parameters about a population that will be of interest in this course: 1. Population Average: The average value for all the numerical values in the population. 2. Population Proportion: What percent (or proportion) of the population has a given attribute. 3. Population Variance: How much the numerical values in the population vary from the average. It is often difficult, or impossible to find the values for the population parameters because collecting the population data may prove too difficult or expensive. It is possible to take a sample that consists of part of the entire population. Statistic (Numerical Measurements of a Sample) A Statistic is a measurement describing a numerical property about a sample. There are three numerical Statistics about a sample that will be of interest in this course: 1. Sample Average: The average value for the numerical values in the sample. 2. Sample Proportion: What percent (or proportion) of the sample has a given attribute. 3. Sample Variance: How much the numerical values in the sample vary from the average. Even thought the all the sample values come from the population, the sample does not contain all of the population values. It is easy to see that the numerical properties about a sample will not be the same as the numerical properties about a population. Parameter versus Statistic A Parameter is a measurement describing a numerical property about a population. A Statistic is a measurement describing a numerical property about a sample. Classify the following as a Parameter (P) or a Statistic (S). A) 12 students in the graduating class of 400 FLC are transferring to U.C. Davis B) I asked every student enrolled in my class if they had done their homework and 45% of them said that they had not. C) Every head coach in the NFL is male. D) During the midnight opening of the last Harry Potter film 89 people out of the 300 in attendance had a wand with them. Answers: A) S B) P C) P D) S Section 1 1 Lecture Page 7 of 10 2012 Eitel

Inferential Statistics Inferential statistics involves the use of methods that allow us to take numerical properties about a sample of the population and then use those numerical properties to infer what numerical properties about the entire population MIGHT be true. Inferential Statistics also involves methods that produce a measure of how reliable the inference about the entire population is. Samples must represent the entire population The sample data is the basis for inferring what the population data MAY be. For this reason it is extremely critical that the sample data represents the entire population. A sample should have all the different components of the population represented in about the same proportion as they exist in the population. If the sample does not represent the population very closely then any inference about the population based on that sample will not be dependable. If the population being examined is the students in this statistics class then the sample taken must be taken so it represents all those students. If the sample contained all females while the population contained an equal number of male and female students then that sample would not represent the population very well. If the population is all the registered voters in Folsom California and a sample of those voters contains only people over the age of 65 then that sample would not represent the population very well. Random Sample A sample is considered to be a Random Sample if every individual member of the population has an equal chance of being selected A sample must be taken from the population in such a way that the selection process produces a Random Sample. No inferential statistics techniques exist that can can produce dependable results about the population based on a sample whose selection process did not produce a Random Sample. If the selection process DOES NOT produce a Random Sample then the inference process has no value. In almost every statistical procedure used in this course the first requirement is that the sample data collected is from a random sample as defined above. In this course we will not state the method used to get a random sample for each problem. We will simply state that the data collected represents a random sample. A simplistic way to think of creating a random sample is to put every member of the population into a bag and shake the bag until all the members are well mixed. Then reach in a take out as many members of the population as needed for your random sample. The concept of a truly random selection is far more complicated that one may think. Higher level math is required to discuss the concept. Major developments in this area cause some researchers to argue that there is no such thing as any sample being truly random. These researchers say that within every random data set there are areas that are very orderly and thus any sample taken will never be truly random. Even the random number generators used by Intel and gambling casinos have been shown to not be as random as previously thought. It is beyond the scope of this course to discus even a rudimentary level discussion of this concept. An advanced statistics course in experimental design can be taken that will present many methods that can used to help ensure a random sample. Section 1 1 Lecture Page 8 of 10 2012 Eitel

Example 1: An Inference about a Population Mean (Average) I want to know the average number of units taken by college students in California during the Fall 2012 semester. Due to the cost of collecting all the information and the fact that students are adding and dropping classes it would not be possible to collect accurate and up to date population information. I collect a random sample of 4000 college students in California during the Fall 2012 semester and record how many units they were enrolled in. The average number of units for the 4000 sampled students was found to be 12.5 units. Would the 12.5 units taken by the students in the sample be the exact average for all the students enrolled in California during the Fall 2010 semester? It should not surprise you if I said that you CANNOT use a sample to get an exact value about the population. Information based on 4000 students cannot be expected to be the same as information based on the entire the entire population. Inferential statistics allow us to take the fact that the sample of 4000 students were enrolled in an average of 12.5 units and use that sample average to infer what numerical properties MIGHT be true about the entire population. Inferential Statistics involves methods also produce a measure of how reliable the conclusions about the population parameters are. The methods of Inferential statistics allow us make the following statement. I am 95% confident that the average number of units taken by all the college students in California during the Fall 2012 semester falls between 11.8 and 13.2 units Example 2: An Inference about a Population Proportion I want to know the percent (or proportion) of the population of registered voters the United States that support a bill in Congress that would sell the state of Alaska to Canada. Due to the cost of collecting all the population information and the fact that some voters are changing their decisions every day based on TV adds and talk shows it would not be possible to collect accurate and up to date population information. I collect a random sample of 500 people who are registered voters in the United States and record if they are in favor of selling Alaska to Canada. 56% of the people sampled said they were in favor of selling Alaska. 85% of those sampled said that they would be in favor if Sara Palin was part of the deal. (Itʼs a joke) The methods of Inferential statistics allow us make the following statement I am 99% confident that the percent of registered voters in the United States that support a bill in Congress that would sell the state of Alaska to Canada falls between 52% and 60%. Section 1 1 Lecture Page 9 of 10 2012 Eitel

Example 3: An Inference about a Population Standard Deviation I want to know how much the actual volume of Dr. Pepper varies from the reported volume of 20 oz. Due to the cost of collecting all the population information and the fact that bottles are constantly being produced and used it would not be possible to collect accurate and up to date population information. I collect a random sample of 50 20 oz. bottles of Dr. Pepper and personally record the actual volume and then drink each sample. It is a tough job but someone has to do it. I then find how much the volume of these 50 bottles varies from the reported 20 ounces. The methods of Inferential statistics allow us make the following statement I am 98% confident that the amount the volume of Dr. Pepper varies from the reported volume of 20 oz. falls between.1 oz. and.3 oz. Chapters 1 to 3 in this course will involve the study of descriptive statistics. Chapters 5 to 9 in this course will involve the study of Inferential statistics. Section 1 1 Lecture Page 10 of 10 2012 Eitel