STA 218: Statistics for Management

Similar documents
Sample Surveys. Sample Surveys. Al Nosedal. University of Toronto. Summer 2017

CHAPTER 8: Producing Data: Sampling

Basic Practice of Statistics 7th

Chapter 4: Designing Studies

4.1: Samples & Surveys. Mrs. Daniel AP Stats

Chapter 12 Summary Sample Surveys

Chapter 8. Producing Data: Sampling. BPS - 5th Ed. Chapter 8 1

CHAPTER 4 Designing Studies

Stats: Modeling the World. Chapter 11: Sample Surveys

Sample Surveys. Chapter 11

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Albuquerque, UNM

Class 10: Sampling and Surveys (Text: Section 3.2)

b. Stopping students on their way out of the cafeteria is a good way to sample if we want to know about the quality of the food there.

Polls, such as this last example are known as sample surveys.

Other Effective Sampling Methods

AP Statistics S A M P L I N G C H A P 11

Population vs. Sample

Chapter 3 Monday, May 17th

7.1 Sampling Distribution of X

Chapter 12: Sampling

Warm Up The following table lists the 50 states.

March 10, Monday, March 10th. 1. Bell Work: Week #5 OAA. 2. Vocabulary: Sampling Ch. 9-1 MB pg Notes/Examples: Sampling Ch.

Honors Statistics. Daily Agenda

Stat Sampling. Section 1.2: Sampling. What about a census? Idea 1: Examine a part of the whole.

Honors Statistics. Daily Agenda

Census: Gathering information about every individual in a population. Sample: Selection of a small subset of a population.

October 6, Linda Owens. Survey Research Laboratory University of Illinois at Chicago 1 of 22

Objectives. Module 6: Sampling

The Savvy Survey #3: Successful Sampling 1

MAT 1272 STATISTICS LESSON STATISTICS AND TYPES OF STATISTICS

Full file at

Gathering information about an entire population often costs too much or is virtually impossible.

Ch. 12: Sample Surveys

Introduction INTRODUCTION TO SURVEY SAMPLING. Why sample instead of taking a census? General information. Probability vs. non-probability.

Elements of the Sampling Problem!

3. Data and sampling. Plan for today

These days, surveys are used everywhere and for many reasons. For example, surveys are commonly used to track the following:

Introduction INTRODUCTION TO SURVEY SAMPLING. General information. Why sample instead of taking a census? Probability vs. non-probability.

Key Words: age-order, last birthday, full roster, full enumeration, rostering, online survey, within-household selection. 1.

Unit 8: Sample Surveys

Sampling Terminology. all possible entities (known or unknown) of a group being studied. MKT 450. MARKETING TOOLS Buyer Behavior and Market Analysis

STAT 100 Fall 2014 Midterm 1 VERSION B

Sampling Designs and Sampling Procedures

Chapter 1 Introduction

PROBABILITY-BASED SAMPLING USING Split-Frames with Listed Households

Methodology Marquette Law School Poll August 13-16, 2015

Section 6.4. Sampling Distributions and Estimators

An Introduction to ACS Statistical Methods and Lessons Learned

The challenges of sampling in Africa

Methodology Marquette Law School Poll June 22-25, 2017

Comparative Study of Electoral Systems (CSES) Module 4: Design Report (Sample Design and Data Collection Report) September 10, 2012

Pacific Training on Sampling Methods for Producing Core Data Items for Agricultural and Rural Statistics

PUBLIC EXPENDITURE TRACKING SURVEYS. Sampling. Dr Khangelani Zuma, PhD

Methodology Marquette Law School Poll October 26-31, 2016

Methodology Marquette Law School Poll February 25-March 1, 2018

Experiences with the Use of Addressed Based Sampling in In-Person National Household Surveys

Methodology Marquette Law School Poll April 3-7, 2018

Sampling. I Oct 2008

Chapter 4: Sampling Design 1

6 Sampling. 6.2 Target Population and Sample Frame. See ECB (2011, p. 7). Monetary Policy & the Economy Q3/12 addendum 61

Introduction. Descriptive Statistics. Problem Solving. Inferential Statistics. Chapter1 Slides. Maurice Geraghty

Session V: Sampling. Juan Muñoz Module 1: Multi-Topic Household Surveys March 7, 2012

Mathematicsisliketravellingona rollercoaster.sometimesyouron. Mathematics. ahighothertimesyouronalow.ma keuseofmathsroomswhenyouro

Use of administrative sources and registers in the Finnish EU-SILC survey

Sampling Techniques. 70% of all women married 5 or more years have sex outside of their marriages.

Sampling distributions and the Central Limit Theorem

Jeopardy. Ben is too lazy to think of fancy titles

Chapter 5: Probability: What are the Chances? Section 5.2 Probability Rules

Massachusetts Renewables/ Cape Wind Survey

Survey of Massachusetts Congressional District #4 Methodology Report

POLI 300 PROBLEM SET #2 10/04/10 SURVEY SAMPLING: ANSWERS & DISCUSSION

Section 2: Preparing the Sample Overview

FINANCIAL LITERACY SURVEY IN BOSNIA AND HERZEGOVINA 2011

CH 13. Probability and Data Analysis

Introduction. (Good) Sources of Drug Use Data [drugdata.pdf]

SAMPLING. A collection of items from a population which are taken to be representative of the population.

AmericasBarometer, 2016/17

Eastlan Ratings Radio Audience Estimate Survey Methodology

Sampling Subpopulations in Multi-Stage Surveys

Guyana - Multiple Indicator Cluster Survey 2014

Sierra Leone - Multiple Indicator Cluster Survey 2017

Cluster Assessment Pre visits Community Involvement & Census

SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT)

Social Studies 201 Notes for November 8, 2006 Sampling distributions Rest of semester For the remainder of the semester, we will be studying and

Sampling, Part 2. AP Statistics Chapter 12

SURVEY ON POLICE INTEGRITY IN THE WESTERN BALKANS (ALBANIA, BOSNIA AND HERZEGOVINA, MACEDONIA, MONTENEGRO, SERBIA AND KOSOVO) Research methodology

Moore, IPS 6e Chapter 05

Statistical and operational complexities of the studies I Sample design: Use of sampling and replicated weights

not human choice is used to select the sample.

Lecture 3. Lecture Outline. 1. Turn in Homework 2. Sampling Quiz 3. Essay Writing Lecture. Assignments

Sample size, sample weights in household surveys

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Botswana - Botswana AIDS Impact Survey III 2008

SAMPLING BASICS. Frances Chumney, PhD

1981 CENSUS COVERAGE OF THE NATIVE POPULATION IN MANITOBA AND SASKATCHEWAN

Zambia - Demographic and Health Survey 2007

RECOMMENDED CITATION: Pew Research Center, March 2014, Hillary Clinton s Strengths: Record at State, Toughness, Honesty

Math 227 Elementary Statistics. Bluman 5 th edition

2012 Ohio Medicaid Assessment Survey

6 Sampling. 6.2 Target population and sampling frame. See ECB (2013a), p. 80f. MONETARY POLICY & THE ECONOMY Q2/16 ADDENDUM 65

Transcription:

Al Nosedal. University of Toronto. Fall 2017

My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump.

Population, Sample, Sampling Design The population in a statistical study is the entire group of individuals about which we want information. A sample is a part of the population from which we actually collect information. We use a sample to draw conclusions about the entire population. A sampling design describes exactly how to choose a sample from the population. The first step in planning a sample survey is to say exactly what population we want to describe. The second step is to say exactly what we want to measure, that is, to give exact definitions of our variables.

Customer satisfaction A department store mails a customer satisfaction survey to people who make credit card purchases at the store. This month, 45,000 people made credit card purchases. Surveys are mailed to 1000 of these people, chosen at random, and 137 people return the survey form. a) What is the population of interest for this survey? b) What is the sample? From what group is information actually obtained?

Solutions a) The population is all 45,000 people who made credit card purchases. b) The sample is the 137 people who returned the survey form.

How to sample badly The final step in planning a sample survey is the sampling design. A sampling design is a specific method for choosing a sample from the population. The easiest- but not the best - design just chooses individuals close at hand. A sample selected by taking the members of the population that are easiest to reach is called a convenience sample. Convenience samples often produce unrepresentative data.

Bias The design of a statistical study is biased if it systematically favors certain outcomes.

Voluntary response sample A voluntary response sample consists of people who choose themselves by responding to a broad appeal. Voluntary response samples are biased because people with strong opinions are most likely to respond.

Sampling on campus You see a woman student standing in front of the student center, now and then stopping other students to ask them questions. She says that she is collecting student opinions for a class assignment. Explain why this sampling method is almost certainly biased.

Solution It is a convenience sample; she is only getting opinions from students who are at the student center at a certain time of day. This might underrepresent some group: commuters, graduate students, or nontraditional students, for example.

Simple Random Sampling A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected.

Random digits A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these two properties: 1. Each entry in the table is equally likely to be any of the 10 digits 0 through 9. 2. The entries are independent of each other. That is, knowledge of one part of the table gives no information about any other part.

Using Table B to choose an SRS Label: Give each member of the population a numerical label of the same length. Table: To choose a SRS, read from Table B successive groups of digits of the length you used as labels. Your sample contains the individuals whose labels you find in the table.

Apartment living You are planning a report on apartment living in a college town. You decide to select three apartment complexes at random for in-depth interviews with residents. Use software or Table B to select a simple random sample of 4 of the following apartment complexes. If you use Table B, start at line 122.

Ashley Oaks Country View Mayfair Village Bay Pointe Country Villa Nobb Hill Beau Jardin Crestview Pemberly Courts Bluffs Del-Lynn Peppermill Brandon Place Fairington Pheasant Run Briarwood Fairway Knolls River Walk Brownstone Fowler Sagamore Ridge Burberry Place Franklin Park Salem Courthouse Cambridge Georgetown Village Square Chauncey Village Greenacres Waterford Court

Solution 01 Ashley Oaks 11 Country View 21 Mayfair Village 02 Bay Pointe 12 Country Villa 22 Nobb Hill 03 Beau Jardin 13 Crestview 23 Pemberly Courts 04 Bluffs 14 Del-Lynn 24 Peppermill 05 Brandon Place 15 Fairington 25 Pheasant Run 06 Briarwood 16 Fairway Knolls 26 River Walk 07 Brownstone 17 Fowler 27 Sagamore Ridge 08 Burberry Place 18 Franklin Park 28 Salem Courthouse 09 Cambridge 19 Georgetown 29 Village Square 10 Chauncey Village 20 Greenacres 30 Waterford Court

Solution (cont.) With Table B, enter at line 122 and choose 13 = Crestview, 15 = Fairington, 05 = Brandon Place, and 29 = Village Square.

R Code set.seed(2016); # Use to reproduce the sample below; sample(1:30,4); # 2nd number represents sample size;

R Code ## [1] 6 5 24 4

Inference about the Population The purpose of a sample is to give us information about a larger population. The process of drawing conclusions about a population on the basis of sample data is called inference because we infer information about the population from what we know about the sample. Inference from convenience samples or voluntary response samples would be misleading because these methods of choosing a sample are biased. We are almost certain that the sample does not fairly represent the population. The first reason to rely on random sampling is to eliminate bias in selecting samples from the list of available individuals.

Sampling frame The list of individuals from which a sample is actually selected is called the sampling frame. Ideally, the frame should list every individual in the population, but in practice this is often difficult. A frame that leaves out part of the population is a common source of undercoverage. Suppose that a sample of households in a community is selected at random from the telephone directory. What households are omitted from this frame? What types of people do you think are likely to live in these households? These people will probably be underrepresented in the sample.

Solution This design would omit households without telephones or with unlisted numbers. Such households would likely be made up of poor individuals (who cannot afford a phone), those who choose not to have phones, and those who do not wish to have their phone number published.

Cautions about sample surveys Random selection eliminates bias in the choice of a sample from a list of the population. When the population consists of human beings, however, accurate information from a sample requires more than a good sampling design. To begin, we need an accurate and complete list of the population. Because such a list is rarely available, most samples suffer from some degree of undercoverage. A sample survey of households, for example, will miss not only homeless people but prison inmates and students in dormitories. An opinion poll conducted by calling landline telephone numbers will miss households that have only cell phones as well as households without a phone. The results of national sample surveys therefore have some bias if the people not covered differ from the rest of the population. A more serious source of bias in most sample surveys is nonresponse, which occurs when a selected individual cannot be contacted or refuses to cooperate.

Nonresponse Academic sample surveys, unlike commercial polls, often discuss nonresponse. A survey of drivers began by randomly sampling all listed residential telephone numbers in the United States. Of 45,956 calls to these numbers, 5029 were completed. What was the rate of nonresponse for this sample? (Only one call was made to each number. Nonresponse would be lower if more calls were made.)

Solution The response rate was 5029 was 1 0.1094 = 0.8906 45956 = 0.1094, so the nonresponse rate

Undercoverage and nonresponse Undercoverage occurs when some groups in the population are left out of the process of choosing the sample. Nonresponse occurs when an individual chosen for the sample can t be contacted or refuses to participate.

Stratified Random Sample To select a stratified random sample, first divide the population into groups of similar individuals, called strata. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample.

Toy Example Suppose we have a population of size 5. We measure a variable for each of these 5 individuals, the result of our measurements follows: 50, 55, 60, 70, and 80. Now, we compute the population mean, which we denote by µ, µ = 63.

Toy example Let us see what happens if we used a simple random sample of size 2 to compute the population mean, µ. Sample Measurements Sample mean x i 1 (50, 55) 52.5 2 (50, 60) 55 3 (50, 70) 60 4 (50, 80) 65 5 (55, 60) 57.5 6 (55, 70) 62.5 7 (55, 80) 67.5 8 (60, 70) 65 9 (60, 80) 70 10 (70, 80) 75

Toy example Sample x i µ 1 52.5-63 = -10.5 2 55-63 = -8 3 60-63 = -3 4 65-63 = 2 5 57.5-63 = -5.5 6 62.5-63 = -0.5 7 67.5-63 =4.5 8 65-63=2 9 70-63=7 10 75-63=12

Toy example Let us see what happens if we used a stratified random sample of size 2 to compute the population mean, µ. Assume that the stratum 1 is formed by:{50, 55, 60} and stratum 2 by: {70, 80}. Sample Measurements Sample mean x i 1 (50, 70) 60 2 (50, 80) 65 3 (55, 70) 62.5 4 (55, 80) 67.5 5 (60, 70) 65 6 (60, 80) 70

Toy example Sample x i µ 1 60-63 = -3 2 65-63 = 2 3 62.5-63 = -0.5 4 67.5-63 =4.5 5 65-63=2 6 70-63=7