Statistical and operational complexities of the studies I Sample design: Use of sampling and replicated weights

Statistical and operational complexities of the studies I Sample design: Use of sampling and replicated weights Andrés Sandoval-Hernández IEA DPC Workshop on using PISA, PIAAC, TIMSS & PIRLS, TALIS datasets Ispra, Italy- June 24-27, 2014 Note: These slides were prepared as part of the IEA training portfolio with the collaboration of IEA staff and resource persons.

Table of Contents Introduction The sampling design The assessment design 2

Some of the Challenges... Domain is very broad Limited testing time (physical & psychological) Challenges: Need to administer the items in a sensible design Need to summarize performance on the items Need to account for unreliability of estimates 3

What s Common? Complex Sample Design Probabilistic, stratified, multistage sample designs Need to take sample design into account when computing estimates Complex Assessment Design Multiple matrix sample designs where nobody takes all items, and not all items are given to all test participants. Need to take measurement uncertainty into account when computing estimates 4

How Does Sampling Help? Impossible to test everyone on everything Too many people Too many items Too expensive Not necessary to test everyone on everything Blood sample Soup sample Some people are tested on some things Results should be seen in the context of the student and item sample design 5

Table of Contents Introduction The sampling design The assessment design 6

An Example 7

An Example 8

General Sample Design Populations of Interest TIMSS: 8 th and 4 th graders PIRLS: 4 th graders PISA: 15 year old students PIAAC: adults TALIS: teachers (ISCED level 1, 2 and 3) 9

General Sample Design - Students Basic student sampling design is referred to as a twostage stratified cluster sample design 1st Stage: Selection of schools (PIRLS & TIMSS, PISA, TALIS) 2nd Stage (PIRLS & TIMSS): Selection of classes within schools 2nd Stage (PISA): Selection of students within schools 2nd Stage (TALIS): Selection of teachers within schools Consider the alternatives... 10

General Sample Design - PIAAC Multistage stratified cluster sampling design Multistage There are different stages/levels of selection For example: Municipalities Block Household-Person Stratified Selection takes place across different segments of the population Achieved by systematic selection across sorted list, or targeted selection within different groups Cluster Multiple individuals are selected from within segments of the population Segments could be municipalities, blocks, etc. 11

PIAAC Sampling Design Sampling Frames Three broad types Population registers Administrative lists of residents are maintained at either national or regional level Master samples Lists of dwelling units or primary sampling units are maintained at national level for official surveys Area frames A frame of geographic clusters formed by combining adjacent geographic areas, respecting their population sizes, and taking into consideration travel distances for interviewers 12

PIAAC Sampling Design Sampling Frames Required to cover at least 95 percent of the target population Limited exclusion (non-coverage) of groups in the target population Hard-to-reach groups such as the populations of remote and isolated regions 13

PIAAC Sampling Design Sample Sizes Minimum sample size required depended on two variables: Number of cognitive domains assessed Number of languages in which the assessment was administered Participating countries had the choice of assessing all three domains or assessing literacy and numeracy only Minimum sample size for one language required: 5,000 completed cases if all three domains were assessed 4,500 if only literacy and numeracy were assessed 14

PIAAC Sampling Design Sample Sizes To fully report results in more than one language, the required sample size is either 4,500 or 5,000 cases per reporting language When not reporting results separately by language, the required sample size is at least 5,000 completed cases collected in the principal language A completed case is defined as an interview in which the respondent provided answers to key background questions, including age, gender, highest level of schooling and employment status, and completed the core cognitive instrument 15

Why Do It This Way? Among many reasons Availability of information Cost reduction Ensure representation of target population groups Achieve desired precision levels for target groups Redundancy 16

What are the Consequences? We DO NOT have a simple random selection/sample (SRS) from the population Think of selecting clusters, and then persons within clusters Persons within a cluster are likely to be more similar to each other than to persons in other clusters This matters when we compute sampling errors 17

Selecting Individuals vs. Clusters 18

Random vs. Systematic Selection 19

Oversampling 20

Exclusions from the Population 21

Sampling Weights Sampling weights are an inverse of the probability of the selection for a person They take into account characteristics of the sample and selection procedure Stratification or disproportional sampling of subgroups Adjustments for nonresponse Selection probability of each person is known Poststratification to external control totals Sampling weights must ALWAYS be used to get correct population estimates 22

Computing Means Unweighted _ x Mean( x) N Weighted _ Mean( x) wgt * wgt x 23

Estimating Sampling Variance Simple random sample (SRS) of 4,500 students from all students in the population covers the population diversity better than a sample of 100 schools with 45 students sampled in each school A two-stage design has more uncertainty associated with its estimates than a SRS of the same size The increase in uncertainty in a two-stage design is directly related to the differences between and within the school 24

Estimating Sampling Variance Consider the extreme where all schools are different but within each school, all students are identical Sample 100 schools and 45 students per school, the effective sample size is really only 100 as opposed to 4,500 This example is an extreme case but shows that in general, with such designs, the effective sample size will be decreased from the actual sample size 25

Estimating Sampling Variance Sampling automatically results in some uncertainty (called error or variance ) Which factors can influence the magnitude of variance in a sample? How we sample Sample size Variability within the population Think of a SRS and how these factors influence the variance of an estimate 26

Using Replicate Weights How to properly estimate sampling variance from a complex design? Replicate samples Brief explanation: Delete different sub-samples from the full sample to form G replicate samples Adjust weights of the remaining units to account for the deleted units new weights are called replicate weights Produce an estimate using the full sample weight and an estimate from each set of replicate weights Calculate the variance of these estimates 27

Using Replicate Weights The variation of the full sample estimate replicates provide a measure of the variance of the full sample estimate Advantages of replication: Convenient to use Effects of non-response and other adjustments can be reflected in replicate weights Estimates can be computed for subpopulations Applicable to most statistics 28

Estimating Sampling Variance Jackknife Repeated Replication (JK2) used in TIMSS, PIRLS Balanced Repeated Replication (BRR) used in PISA, TALIS These procedures make use of replicate weights In PISA, replicate weights are stored in the database In TIMSS, PIRLS and ICCS replicate weights are computed on the fly 29

How the JK2 works... Strata Cluster R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 1 2 3 4 5 6 7 8 9 10 1 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 3 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 4 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 6 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 7 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 8 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 9 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 10 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 11 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 12 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 13 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 14 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 15 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 16 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 17 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.0 18 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 19 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 20 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 30

How the BRR Fay s Variant works... Strata Cluster R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 1 2 3 4 5 6 7 8 9 10 11 12 1 1.0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 2 1.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 3 1.0 1.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 4 1.0 0.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 5 1.0 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 6 1.0 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 7 1.0 1.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 8 1.0 0.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 9 1.0 1.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 10 1.0 0.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 11 1.0 1.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 0.5 12 1.0 0.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 1.5 13 1.0 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 1.5 14 1.0 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 0.5 15 1.0 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 1.5 16 1.0 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 0.5 17 1.0 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 1.5 18 1.0 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 19 1.0 1.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 0.5 20 1.0 0.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 1.5 21 1.0 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 1.5 22 1.0 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 0.5 23 1.0 1.5 1.5 0.5 1.5 1.5 1.5 0.5 0.5 0.5 1.5 0.5 0.5 24 1.0 0.5 0.5 1.5 0.5 0.5 0.5 1.5 1.5 1.5 0.5 1.5 1.5 31

Estimating Sampling Variance Think of the following: If I take clusters out, recalculate, and results DO change What can I say about the rest of the clusters not sampled? What can I say about other samples I could have drawn? If I take clusters out, recalculate, and results DO NOT change What can I say about the rest of the clusters not sampled? What can I say about other samples I could have drawn? 32

Calculating Sampling Variance When Var f * r 0 R r 1 2 Using JK2: f 1.0 Using BRR (w/fay): f R* 1 1 FayFac 2 33

Sampling Summary Always use sampling weights Always take design into account when computing sampling variance 34

Table of Contents Introduction The sampling design The assessment design 35

TIMSS, PIRLS, PIAAC & PISA Assessment Design* Rotated block assessment design: blocks of items are rotated among several booklets No individual answered all items In TIMSS and PIRLS students encounter booklets with both mathematics and science items In PISA and PIACC individuals encounter booklets with items from one or more domains Each booklet contained both trend items and new items * TALIS does not include assessment 36

PISA Assessment Design 2000 2003* 2006 2009 2012* Reading MAJOR Minor Minor MAJOR Minor Mathematics Minor MAJOR Minor Minor MAJOR Science Minor Minor MAJOR Minor Minor * Problem solving 37

PIAAC Assessment Design Source: http://www.oecd.org/site/piaac/ 38

Any questions? Thank you for your attention! 39