Why Randomize? Jim Berry Cornell University

Similar documents
Why Randomize? Dan Levy Harvard Kennedy School

Randomized Evaluations in Practice: Opportunities and Challenges. Kyle Murphy Policy Manager, J-PAL January 30 th, 2017

Marc Shotland. J-PAL Global TRANSLATING RESEARCH INTO ACTION

Innosup Supporting Experimentation in Innovation Agencies

Statistical Methods in Computer Science

Chess and Intelligence: Lessons for Scholastic Chess

Course Overview J-PAL HOW TO RANDOMIZE 2

Probability - Introduction Chapter 3, part 1

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 13

Chapter 3 Monday, May 17th

Chess as a cognitive training ground. Six years of trials in primary schools.

Objectives. Module 6: Sampling

Table A.1 Variable definitions

Basic Probability Ideas. Experiment - a situation involving chance or probability that leads to results called outcomes.

MITOCW mit_jpal_ses06_en_300k_512kb-mp4

A brief introduction to... Evidence-informed policymaking

Evolutions of communication

Statistical Hypothesis Testing

Stat Sampling. Section 1.2: Sampling. What about a census? Idea 1: Examine a part of the whole.

Unit 6: What Do You Expect? Investigation 2: Experimental and Theoretical Probability

Wright-Fisher Process. (as applied to costly signaling)

Critical and Social Perspectives on Mindfulness

Unit 1B-Modelling with Statistics. By: Niha, Julia, Jankhna, and Prerana

Learning from Evaluation when Context Matters

Presented by Menna Brown

COMPONENTS OF CREATIVITY

Sampling Terminology. all possible entities (known or unknown) of a group being studied. MKT 450. MARKETING TOOLS Buyer Behavior and Market Analysis

Chapter 8. Producing Data: Sampling. BPS - 5th Ed. Chapter 8 1

Class 10: Sampling and Surveys (Text: Section 3.2)

Social Identity Theory

SPIRE MATHS Stimulating, Practical, Interesting, Relevant, Enjoyable Maths For All

Dota2 is a very popular video game currently.

On the Monty Hall Dilemma and Some Related Variations

More on games (Ch )

These Are a Few of My Favorite Things

GREATER CLARK COUNTY SCHOOLS PACING GUIDE. Algebra I MATHEMATICS G R E A T E R C L A R K C O U N T Y S C H O O L S

December 12, FGCU Invitational Mathematics Competition Statistics Team

green, green, green, green, green The favorable outcomes of the event are blue and red.

Polls, such as this last example are known as sample surveys.

1. The chance of getting a flush in a 5-card poker hand is about 2 in 1000.

1) What is the total area under the curve? 1) 2) What is the mean of the distribution? 2)

ONLINE APPENDIX: SUPPLEMENTARY ANALYSES AND ADDITIONAL ESTIMATES FOR. by Martha J. Bailey, Olga Malkova, and Zoë M. McLaren.

Simulations. 1 The Concept

Chapter 4: Designing Studies

Random. \Essays\Random

TWO BY TWO: A METHODOLOGICAL PERSPECTIVE ON THE USE OF EVIDENCE TO SUPPORT THE VALUE OF A HEALTH TECHNOLOGY

CCMR Educational Programs

Getting ideas: watching the sketching and modelling processes of year 8 and year 9 learners in technology education classes

Stats: Modeling the World. Chapter 11: Sample Surveys

Chapter 20. Inference about a Population Proportion. BPS - 5th Ed. Chapter 19 1

February 24, [Click for Most Updated Paper] [Click for Most Updated Online Appendices]

Methods and Techniques Used for Statistical Investigation

Hypergeometric Probability Distribution

Sampling distributions and the Central Limit Theorem

Estimation of the number of Welsh speakers in England

2. Survey Methodology

Key Words: age-order, last birthday, full roster, full enumeration, rostering, online survey, within-household selection. 1.

Grade 8 Math Assignment: Probability

Applied Microeconometrics Chapter 5 Instrumental Variables with Heterogeneous Causal Effect

PERMUTATION TESTS FOR COMPLEX DATA

Expected Value, continued

SVUF, 19 October 2017

The Savvy Survey #3: Successful Sampling 1

Design Science Research Methods. Prof. Dr. Roel Wieringa University of Twente, The Netherlands

Chapter 9. Producing Data: Experiments. BPS - 5th Ed. Chapter 9 1

Specifying, predicting and testing:

Using Administrative Records for Imputation in the Decennial Census 1

Science Binder and Science Notebook. Discussions

Lecture Start

Field Markets and Institutions

Foundations for Functions

Mini-Unit. Data & Statistics. Investigation 1: Correlations and Probability in Data

Learning to Play like an Othello Master CS 229 Project Report. Shir Aharon, Amanda Chang, Kent Koyanagi

Name Class Date. Introducing Probability Distributions

Basic Practice of Statistics 7th

Case 1: If Denver is the first city visited, then the outcome looks like: ( D ).

Methods for SE Research

Field Markets & Institutions

Possible responses to the 2015 AP Statistics Free Resposne questions, Draft #2. You can access the questions here at AP Central.

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Albuquerque, UNM

The Samasource Freelance Agency

More on games (Ch )

ISSN: (Online) Volume 4, Issue 4, April 2016 International Journal of Advance Research in Computer Science and Management Studies

The Health Information Future: Evolution and/or Intelligent Design?

3. Data and sampling. Plan for today

Gathering information about an entire population often costs too much or is virtually impossible.

John Jerrim Lindsey Macmillan John Micklewright Mary Sawtell Meg Wiggins. UCL Institute of Education May 2017

Selecting an Appropriate Caliper Can Be Essential for Achieving Good Balance With Propensity Score Matching

Adjusting your IWA for Global Perspectives

The Next Generation Science Standards Grades 6-8

Department of Statistics and Operations Research Undergraduate Programmes

Session V: Sampling. Juan Muñoz Module 1: Multi-Topic Household Surveys March 7, 2012

Nessie is alive! Gerco Onderwater. Role of statistics, bias and reproducibility in scientific research

in the New Zealand Curriculum

Section 6.5 Conditional Probability

4. Are events C and D independent? Verify your answer with a calculation.

General Education Rubrics

An Introduction to Machine Learning for Social Scientists

Mozambique - Rural Water Supply

Web Appendix: Online Reputation Mechanisms and the Decreasing Value of Chain Affiliation

Transcription:

Why Randomize? Jim Berry Cornell University

Session Overview I. Basic vocabulary for impact evaluation II. III. IV. Randomized evaluation Other methods of impact evaluation Conclusions J-PAL WHY RANDOMIZE 3

Components of Programme Evaluation Needs Assessment What is the problem? Programme Theory Assessment Process Evaluation How, in theory, does the Programme fix the problem? Does the Programme work as planned? Impact Evaluation Were its goals achieved? The magnitude? Cost Effectiveness Given magnitude and cost, how does it compare to alternatives?

BASIC VOCABULARY FOR IMPACT EVALUATION

Example: Immunization Incentives The Problem: Despite availability of free immunization, full coverage rates among children remains extremely low in many developing countries Intervention Reliable, monthly immunization camps set up in villages in Udaipur Small incentives offered to mothers conditional on having child immunized; larger incentive when immunization course completed

Which one of these would make a good question for impact evaluation? A. What percentage of 3 year old children in Rajasthan were not fully immunized? 81% B. What is the correlation between regular immunization camps and immunization rates? C. Does holding regular immunization camps and providing incentives to parents improve immunization rates of children? 8% 12% A. B. C. J-PAL WHY RANDOMIZE 7

Causal Inference Cause and effect language is used everyday in a lot of contexts, but it means something very specific in impact evaluation. We can think of causality as: The singular effect of a program on an outcome of interest Independent of any other intervening factors, Our goal is to estimate the size of this effect accurately and with confidence

How to measure impact? Impact (also called causal effect ) is defined as a comparison between: 1. The outcome some time after the program has been introduced 2. The outcome at that same point in time had the program not been introduced (the counterfactual ) J-PAL WHY RANDOMIZE 9

What is the impact of this program? Immunization rates Program starts Time J-PAL WHY RANDOMIZE 10

What is the impact of this program? A. Positive B. Negative C. Zero D. Not enough info 69% 31% 0% 0% A. B. C. D. J-PAL WHY RANDOMIZE 11

What is the impact of this program? Immunization rates Program starts Impact Time J-PAL WHY RANDOMIZE 12

Impact: What is it? Program starts Impact Immunization rates Time J-PAL WHY RANDOMIZE 13

Impact: What is it? Immunization rates Program starts Impact Time J-PAL WHY RANDOMIZE 14

Counterfactual The counterfactual represents the state of the world that program participants would have experienced in the absence of the program (i.e. had they not participated in the program) Problem: Counterfactual cannot be observed Solution: We need to mimic or construct the counterfactual J-PAL WHY RANDOMIZE 15

Constructing the counterfactual Usually done by selecting a group of individuals that did not participate in the program This group is usually referred to as the control group or comparison group How this group is selected is a key decision in the design of any impact evaluation J-PAL WHY RANDOMIZE 16

Selecting the comparison group Idea: Select a group that is exactly like the group of participants in all ways except one: their exposure to the program being evaluated Goal: To be able to attribute differences in outcomes between the group of participants and the comparison group to the program (and not to other factors) An impact evaluation is only as good as the comparison group it uses to mimic the counterfactual J-PAL WHY RANDOMIZE 17

Impact evaluation methods 1. Randomized Experiments Use random assignment of the program to create a comparison group which mimics the counterfactual. Also known as: Random Assignment Studies Randomized Field Trials Social Experiments Randomized Controlled Trials (RCTs) Randomized Controlled Experiments J-PAL WHY RANDOMIZE 18

Impact evaluation methods 2. Non- or Quasi-Experimental Methods Argue that a certain excluded group mimics the counterfactual a. Pre-Post b. Simple Difference c. Differences-in-Differences d. Multivariate Regression e. Statistical Matching f. Interrupted Time Series g. Instrumental Variables h. Regression Discontinuity J-PAL WHY RANDOMIZE 19

Example: Balsakhi Program J-PAL WHY RANDOMIZE 20

Balsakhi Program: Background Problem: Many children in 3 rd and 4 th standard were not even at the 1 st standard level of competency Class sizes were large Social distance between teacher and many of the students was large Proposed solution: Hire local women (balsakhis) from the community and train them to teach basic competencies (reading, numeracy) to lowest performing students Implemented by Pratham, an NGO from India In Vadodara, the balsakhi program was run in government primary schools in 2002-2003 Teachers decided which children would get the balsakhi J-PAL WHY RANDOMIZE 21

Balsakhi: Outcomes Children were tested at the beginning of the school year (Pretest) and at the end of the year (Post-test) QUESTION: How can we estimate the impact of the balsakhi program on test scores? J-PAL WHY RANDOMIZE 22

Randomized Evaluation Suppose we evaluated the balsakhi program using a randomized evaluation QUESTION #1: What would this entail? How would we do it? QUESTION #2: What would be the advantage of using this method to evaluate the impact of the balsakhi program? J-PAL WHY RANDOMIZE 24

The basics Take a sample of program applicants Randomly assign them to either: Treatment Group is offered the program Control Group not allowed to receive the program (during the evaluation period) The two groups will, on average, have the same observable and unobservable characteristics since assignment is purely by chance provided we have a large enough number of units Impact = Difference in outcomes between the treatment and control groups after the program J-PAL WHY RANDOMIZE 25

Key advantage of experiments Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment, any difference that subsequently arises between them can be attributed to the program rather than to other factors. If properly designed and conducted, randomized experiments provide the most credible method to estimate the impact of a program J-PAL WHY RANDOMIZE 27 27

Testing Assumptions: Randomized Evaluations What is the main assumption of randomized evaluation that must hold for it to give the true impact of the program? No randomization failure: that randomization generates two statistically identical groups How can you test whether this assumption is true? Balance test compare their characteristics at baseline (beginning of the program)

Basic set-up of a Randomized Evaluation Total Population Target Population Not in evaluation Evaluation Sample Random Assignment Treatment Group Control Group

When to do a Randomized Evaluation? When there is an important question you want/need to know the answer to Timing--not too early and not too late Program is representative not gold plated Or tests an basic concept you need tested Time, expertise, and money to do it right Develop an evaluation plan to prioritize

When NOT to do a Randomized Evaluation? When the program is premature and still requires considerable tinkering to work well When the project is on too small a scale to randomize into two representative groups If a positive impact has been proven using rigorous methodology and resources are sufficient to cover everyone After the program has already begun and you are not expanding elsewhere

NON AND QUASI-EXPERIMENTAL METHODS

Non or Quasi-Experimental Methods Let us look at other methods of estimating impact using the data from the schools that got a balsakhi 1. Pre Post (Before vs. After) 2. Simple difference 3. Difference-in-difference Other methods can be effective if the specific conditions needed for that method s assumption to hold exist Limitation: Conditions needed for them to be valid do not always apply J-PAL WHY RANDOMIZE 35

1 - Pre-post (Before vs. After) Look at average change in test scores over the school year for the balsakhi children J-PAL WHY RANDOMIZE 36

1 - Pre-post (Before vs. After) Average post-test score for children with a balsakhi Average pretest score for children with a balsakhi 51.22 24.80 Difference 26.42 QUESTION: Under what conditions can this difference (26.42) be interpreted as the impact of the balsakhi program? J-PAL WHY RANDOMIZE 37

Which of the following represents the counterfactual in this case: A. Balsakhi students before participating in the programme B. The non-balsakhi students in the same schools C. Students from other schools in Vadodara where the Balsakhi progamme is not being implemented D. None of the above 50% 38% 12% 0% A. B. C. D.

What would have happened without Balsakhi? Method 1: Before vs. After Impact = 26.42 points? 75 50 25 26.42 points? 0 2002 2003 J-PAL WHY RANDOMIZE 39

2 - Simple difference Compare test scores of With test scores of Children who got balsakhi Children who did not get balsakhi J-PAL WHY RANDOMIZE 40

2 - Simple difference Average score for children with a balsakhi Average score for children without a balsakhi 51.22 56.27 Difference -5.05 QUESTION: Under what conditions can this difference (-5.05) be interpreted as the impact of the balsakhi program? J-PAL WHY RANDOMIZE 41

Which of the following represents the counterfactual in this case: A. Balsakhi students before participating in the programme 79% B. The non-balsakhi students in the same schools C. Students from other schools in Vadodara where the Balsakhi progamme is not being implemented D. None of the above 11% 11% 0% 42 A. B. C. D.

What would have happened without balsakhi? Method 2: Simple Comparison Impact = -5.05 points? 75 50-5.05 points? 25 0 2002 2003 J-PAL WHY RANDOMIZE 43

Selection Bias Non-participants Population Baseline Intervention Endline Participants Is this difference due to the program? J-PAL WHY RANDOMIZE 44 Or pre-existing differences?

3 Difference-in-Differences Compare gains in test scores of With gains in test scores of Children who got balsakhi Children who did not get balsakhi J-PAL WHY RANDOMIZE 45

3 Difference-in-difference Pretest Post-test Difference Average score for children with a balsakhi 24.80 51.22 26.42 J-PAL WHY RANDOMIZE 46

3 Difference-in-difference Pretest Post-test Difference Average score for children with a balsakhi Average score for children without a balsakhi 24.80 51.22 26.42 36.67 56.27 19.60 J-PAL WHY RANDOMIZE 47

3 Difference-in-difference Pretest Post-test Difference Average score for children with a balsakhi Average score for children without a balsakhi 24.80 51.22 26.42 36.67 56.27 19.60 Difference 6.82 QUESTION: Under what conditions can this difference (6.82) be interpreted as the impact of the balsakhi program? J-PAL WHY RANDOMIZE 48

What would have happened without balsakhi? Method 3: Difference-in-differences 75 50 25 26.42 19.60 6.82 points? 0 2002 2003

4 Other Methods There are more sophisticated non-experimental methods to estimate program impacts: Regression Matching Instrumental Variables Regression Discontinuity These methods rely on being able to mimic the counterfactual under certain assumptions Problem: Assumptions are not testable J-PAL WHY RANDOMIZE 50

Which of these methods do you think is closest to the truth? Method Impact Estimate (1) Pre-post 26.42* (2) Simple Difference -5.05* (3) Difference-in-Difference 6.82* 52% (4) Regression 1.92 *: Statistically significant at the 5% level A. Pre-Post B. Simple Difference C. Difference-in-Differences D. Regression E. Don t know 30% 11% 4% 4% A. B. C. D. E. J-PAL WHY RANDOMIZE 51

Impact of Balsakhi - Summary Method Impact Estimate (1) Pre-Post 26.42* (2) Simple Difference -5.05* (3) Difference-in-Differences 6.82* (4)Regression 1.92 (5) Randomized Experiment 5.87* *: Statistically significant at the 5% level Bottom Line: Which method we use matters! 52

IV CONCLUSIONS

Conclusions - Why Randomize? There are many ways to estimate a program s impact This course argues in favor of one: randomized experiments Conceptual argument: If properly designed and conducted, randomized experiments provide the most credible method to estimate the impact of a program Empirical argument: Different methods can generate different impact estimates J-PAL WHY RANDOMIZE 54

THANK YOU!