Making Use of Benford s Law for the Randomized Response Technique. Andreas Diekmann ETH-Zurich

Similar documents
Math 4610, Problems to be Worked in Class

Not the First Digit! Using Benford s Law to Detect Fraudulent Scientific Data* Andreas Diekmann Swiss Federal Institute of Technology Zurich

Module 4 Project Maths Development Team Draft (Version 2)

Sampling distributions and the Central Limit Theorem

AP Statistics S A M P L I N G C H A P 11

The Teachers Circle Mar. 20, 2012 HOW TO GAMBLE IF YOU MUST (I ll bet you $5 that if you give me $10, I ll give you $20.)

Unit 8: Sample Surveys

Math 147 Lecture Notes: Lecture 21

Midterm 2 Practice Problems

If a fair coin is tossed 10 times, what will we see? 24.61% 20.51% 20.51% 11.72% 11.72% 4.39% 4.39% 0.98% 0.98% 0.098% 0.098%

Mathematical Foundations HW 5 By 11:59pm, 12 Dec, 2015

Section 6.1 #16. Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

4.3 Rules of Probability

a) Getting 10 +/- 2 head in 20 tosses is the same probability as getting +/- heads in 320 tosses

Unit 1B-Modelling with Statistics. By: Niha, Julia, Jankhna, and Prerana

Suppose Y is a random variable with probability distribution function f(y). The mathematical expectation, or expected value, E(Y) is defined as:

Skip Lists S 3 S 2 S 1. 2/6/2016 7:04 AM Skip Lists 1

Statistics 1040 Summer 2009 Exam III

5. Aprimenumberisanumberthatisdivisibleonlyby1anditself. Theprimenumbers less than 100 are listed below.

November 11, Chapter 8: Probability: The Mathematics of Chance

, x {1, 2, k}, where k > 0. (a) Write down P(X = 2). (1) (b) Show that k = 3. (4) Find E(X). (2) (Total 7 marks)

Discrete Random Variables Day 1

Math 1313 Section 6.2 Definition of Probability

Sampling Terminology. all possible entities (known or unknown) of a group being studied. MKT 450. MARKETING TOOLS Buyer Behavior and Market Analysis

Please Turn Over Page 1 of 7

Probability with Set Operations. MATH 107: Finite Mathematics University of Louisville. March 17, Complicated Probability, 17th century style

INTRODUCTORY STATISTICS LECTURE 4 PROBABILITY

1. A factory makes calculators. Over a long period, 2 % of them are found to be faulty. A random sample of 100 calculators is tested.

Sampling. I Oct 2008

Math 10 Homework 2 ANSWER KEY. Name: Lecturer: Instructions

Week 1: Probability models and counting

Class XII Chapter 13 Probability Maths. Exercise 13.1

Date. Probability. Chapter

Probability. March 06, J. Boulton MDM 4U1. P(A) = n(a) n(s) Introductory Probability

4.1 Sample Spaces and Events

If a fair coin is tossed 10 times, what will we see? 24.61% 20.51% 20.51% 11.72% 11.72% 4.39% 4.39% 0.98% 0.98% 0.098% 0.098%

Section 7.1 Experiments, Sample Spaces, and Events

Lecture 21/Chapter 18 When Intuition Differs from Relative Frequency

AP Statistics Ch In-Class Practice (Probability)

Lenarz Math 102 Practice Exam # 3 Name: 1. A 10-sided die is rolled 100 times with the following results:

Lesson 4: Chapter 4 Sections 1-2

7.1 Chance Surprises, 7.2 Predicting the Future in an Uncertain World, 7.4 Down for the Count

Social Studies 201 Notes for November 8, 2006 Sampling distributions Rest of semester For the remainder of the semester, we will be studying and

Simulations. 1 The Concept

Before giving a formal definition of probability, we explain some terms related to probability.

Math 141 Exam 3 Review with Key. 1. P(E)=0.5, P(F)=0.6 P(E F)=0.9 Find ) b) P( E F ) c) P( E F )

Benford s Law and Fraud Detection. Facts and Legends

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

An extended description of the project:

MAT 1272 STATISTICS LESSON STATISTICS AND TYPES OF STATISTICS

CSC/MTH 231 Discrete Structures II Spring, Homework 5

Probability - Introduction Chapter 3, part 1

3 The multiplication rule/miscellaneous counting problems

MATH 1115, Mathematics for Commerce WINTER 2011 Toby Kenney Homework Sheet 6 Model Solutions

Counting and Probability

Due Friday February 17th before noon in the TA drop box, basement, AP&M. HOMEWORK 3 : HAND IN ONLY QUESTIONS: 2, 4, 8, 11, 13, 15, 21, 24, 27

Stat 20: Intro to Probability and Statistics

Probability: Part 1 1/28/16

Chapter 2. Weighted Voting Systems. Sections 2 and 3. The Banzhaf Power Index

CCST9017 Hidden Order in Daily Life: A Mathematical Perspective. Lecture 8. Statistical Frauds and Benford s Law

Chapter 6: Probability and Simulation. The study of randomness

Exam III Review Problems

Mathacle. Name: Date:

Lecture 1. Permutations and combinations, Pascal s triangle, learning to count

Introduction to probability

Statistical Hypothesis Testing

Ex 1: A coin is flipped. Heads, you win $1. Tails, you lose $1. What is the expected value of this game?

This page intentionally left blank

INDIAN STATISTICAL INSTITUTE

Probabilities and Probability Distributions

Basic Concepts * David Lane. 1 Probability of a Single Event

Homework 8 (for lectures on 10/14,10/16)

Polls, such as this last example are known as sample surveys.

Theory of Probability - Brett Bernstein

Unit 6: What Do You Expect? Investigation 2: Experimental and Theoretical Probability

1) What is the total area under the curve? 1) 2) What is the mean of the distribution? 2)

CHAPTER 2 PROBABILITY. 2.1 Sample Space. 2.2 Events

Key Concepts. Theoretical Probability. Terminology. Lesson 11-1

Junior Circle Meeting 5 Probability. May 2, ii. In an actual experiment, can one get a different number of heads when flipping a coin 100 times?

The study of probability is concerned with the likelihood of events occurring. Many situations can be analyzed using a simplified model of probability

6.041/6.431 Spring 2009 Quiz 1 Wednesday, March 11, 7:30-9:30 PM.

Basic Probability Ideas. Experiment - a situation involving chance or probability that leads to results called outcomes.

Class 10: Sampling and Surveys (Text: Section 3.2)

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

There is no class tomorrow! Have a good weekend! Scores will be posted in Compass early Friday morning J

CS1802 Week 9: Probability, Expectation, Entropy

Normal Distribution Lecture Notes Continued

Such a description is the basis for a probability model. Here is the basic vocabulary we use.

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

Waiting Times. Lesson1. Unit UNIT 7 PATTERNS IN CHANCE

CSC/MATA67 Tutorial, Week 12

STAT Statistics I Midterm Exam One. Good Luck!

CIS 2033 Lecture 6, Spring 2017

Probability Paradoxes

November 8, Chapter 8: Probability: The Mathematics of Chance

Lecture Start

Key Words: age-order, last birthday, full roster, full enumeration, rostering, online survey, within-household selection. 1.

Lesson 1: Chance Experiments

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

Basic Probability Concepts

Transcription:

Benford & RRT Making Use of Benford s Law for the Randomized Response Technique Andreas Diekmann ETH-Zurich

1. The Newcomb-Benford Law Imagine a little bet. The two betters bet on the first digit it of an unknown house number drawn at random. The loser has to pay one euro to the winner. Player A wins if the digit is in the range 1 to 4. Player B wins if the digit is to 9. Is this a fair bet?

1. The Newcomb-Benford Law Imagine a little bet. The two betters bet on the first digit it of an unknown house number drawn at random. The loser has to pay one euro to the winner. Player A wins if the digit is in the range 1 to 4. Player B wins if the digit is to 9. Is this a fair bet? It is not. Paradoxically, the bet is rather unfavourable to player B. The first digits of house numbers follow a logarithmic distribution known as Benford s law. The betters odds are :3 in terms of objective probabilities.

Hungerbühler 2

Benford s Law P(d 1 )=log 1 (1 + 1/d 1 ). 1 2 3 4 8 9.31.1.12.9.9..8.1.4 P(D 1 = d 1,..., D k = d k ) = log 1 [ 1 + (Σd i 1 k-i ) -1 ] with d 1 = 1, 2,...,9 and d j =, 1,...,9 (j = 2,..., k).

Distribution of First Digits of OLS-Regressions Coefficients from Articles Published in the American Journal of Sociology First Digit Distribution.3.3 Fre equencies 2.2.2 1.1.1.. 1 2 3 4 8 9 First Digit Actual Benford Upper Bound Lower Bound N = 14, Tables from AJS 14 / 1. Deviation from Benford is significant for α=.. Diekmann 2

Hungerbühler 2 Digits in the Bible Compilation of Digits in the Elberfelder Konkordanz

Hungerbühler 2 Digits in the Bible Compilation of Digits in the Elberfelder Konkordanz

Benford s Law and the number of votes for candidate Ahmadinejad (Roukema 29)

Sensitive Questions Allen H. Barton, 198. Asking the Embarrassing Question. Public Opinion i Quarterly 22: -88

Barton s (198) method for a very sensitive question

May be RRT is a better method for asking sensitive questions?

2. The Randomized Response Technique (RRT). A Method to Guarantee Full Anonymity for Sensitive Questions Subjects had to respond to either a sensitive question A (e.g. shoplifting, tax evasion etc.) or to a random question B (Was your mother s birthday in an even month?). Assignment to question A or B is by a random device (a dice, a coin etc.) The meaning of an individual answer cannot be identified. However, it is possible to estimate the proportion of shoplifting etc. and other statistics on the aggregate level.

Because the random mechanisms are known one can estimate the probability of answering yes to the sensitive question by Bayes formula. The RRT has the advantage of guaranteeing anonymity, but not without costs. The price is a loss in efficiency. In addition to sampling error, the probabilistic RRT device enlarges the variance of the estimated proportion of positive responses to the sensitive question.

In formal terms: p is the probability to answer the question of interest A, q =1-p is the probability to answer the random question B. π y = P( yes B) is the probability to response yes to the random question. Then, we are looking for an estimate of π x = P( yes A), the expected proportion of respondents answering yes to the question of interest. If we denote the overall proportion of yes in the sample by λ we have: λ = p π x + (1-p) π y. (λ, p,π y is known)

Solving for π x yields: π x = λ/p π y (1-p)/p. p and π y are determined ex ante by the researcher s RRT-design. A special case is the forced response design with π y = 1. In this case, a person is forced to respond yes to the random question. With variance: Var(π x ) = λ(1- λ)/np 2

3. The Benford distribution as a randomizing device In face-to-face interviews, a pack of cards, a dice, a coin or some other device may be used to generate randomized outcomes. For example, if a person tosses head he or she is instructed to answer the random question, if the result is tail the question of interest has to be answered. This technique has some difficulties in telephone interviews and is particularly problematic in selfadministered interviews such as mailed questionnaires or online-surveys. As an alternative, I suggest to make use of the Benford distribution.

House numbers (1st digit) 1,2,3,4 versus,,,8,9 The probability that digit 1, 2, 3 or 4 turns out is, therefore,.99 or roughly.. The probability to draw a first digit among the set of remaining digits is.3. The :3 rule provides a mechanism to split the sample in a set of respondents answering the question of interest A and respondents answering the random question B. For example, subjects are asked to think of the address of a friend and to keep the house number in mind. Depending on the first digit either belonging to the set {1,2,3,4} or belonging to the set {,,,8,9} a person has to answer question A or question B. Other sets may be constructed if a researcher prefers smaller or larger probabilities for the question of interest. However, first we should ask: Do house numbers follow the Benford distribution at all?

House numbers collected from the telephone directory of Zurich 3% 3% Per rcentage 2% 2% 1% 1% % % 1 2 3 4 8 9 House number 29,99% 1,9% 13,1% 1,84% 8,4%,9% 4,% 4,4%,12% Benford 3,1% 1,1% 12,49% 9,9%,92%,9%,8%,12% 4,8% First digit

I i d bt d t S I am indebted to S. Wehrli for compiling the data.

4. The Benford illusion and other advantages of the method The price for the anonymity of the method is an increase in the variance of the estimator for the proportion of yes-responses (π x ) to the question of interest. The variance is (Fox and Tracy 198): Var(π x ) = λ(1- λ)/n(1-q) 2 It follows that the variance increases with the probability q = 1-p to arrive at the random question. On the other hand, the larger q the larger is the degree of anonymity. This is the formal expression for the conflict between efficiency and anonymity.

Benford Illusion To use the Benford distribution for the RRT has the advantage to diminish i i the conflict between efficiency and anonymity. The reason is that the perceived probabilities and the objective probabilities differ. Many people believe that the chance to pick a one, two, three or four is much smaller than percent. This discrepancy or Benford illusion has the positive effect that t the perceived q, and, therefore, the perceived anonymity is larger than the objective q. With the little trick of the Benford illusion, the anonymity can be increased without loss in efficiency.

There are other advantages, too. The method does not require any physical device such as a coin or a dice to generate random numbers. In most previous studies, the RRT is applied to sensitive questions in face-to- face interviews. However, it is unlikely that most people, asked to fill in online-surveys or mailed questionnaires, follow instructions properly if a coin or dice is required.

. Application Shoplifting Questionnaire Imagine a friend or relative who does not live in your house with an address known n to you. Keep in mind the house number s first digit. If the digit ist,,,8 or 9 skip over the next question and mark yes If the digit is 1,2,3,4, please, answer the following question: In the last five years, did you ever intentionally pick a shopping item without paying for it?

Study 1: Shoplifting RRT Experiment in Vorlesung SS Questionnaire in lecture M. Abraham, Bern 2 Ja = 88, Ja = 114 Nein = 181 29 Ja = 114 88, = 2, Result: n =29 2, p (Ladendiebstahl) = 2,/2,/2 =,12 Nein = 181 n = 29 π x =.12 (SE =.4)

Study 2: Shoplifting Result: n = 93 π x = 9/ =.14 (SE =.3) Questionnaire in lecture Szydlick

. Do Subjects underestimate the probability of 1,2,3,4? ( Benford Illusion ) Schätzung der Häufigkeit der Hausnummern mit erster Ziffer 1,2,3,4 14 12 1 8 Percent 4 P N = 289, 9 9 9 9 3 9 2 9 1 9 8 8 8 8 3 8 2 8 8 3 2 1 9 8 9 8 3 1 4 4 4 4 4 4 3 3 3 3 3 2 2 1 1 1 2 2 N 289, Mean = 1. Lecture M. 9 3 2 1 3 2 8 3 2 1 9 8 9 8 3 1 4 3 1 Schätzung der Häufigkeit der Hausnummern mit erster Ziffer 1,2,3,4 Abraham, Bern 2

Estimated t frequency of fhouse numbers starting with 1, 2, 3 or 4 in per cent 14. 12. Percentage e of answe ers 1. 8.. 4. 2.. 1 1 14 2 3 33 4 44 4 4 48 2 8 8 9 93 9 98 Lecture Szydlik, n = 92, mean = 4

Underestimation of Objective Probability (student population) subjective (mean) objective Study 1, Bern 1 Study 2, Zurch 4

. Do subjects generate Benforddistributed house numbers? As we have seen, objective data follow the Benford distribution. However, are the digits produced by the respondents in accordance with Benford as well? This is a crucial assumption. Otherwise, This is a crucial assumption. Otherwise, the method wouldn t work.

. Do subjects generate Benforddistributed house numbers? I am indebted to B. Jann for compiling the data. Survey B. Jann, Wages in Switzerland, 2/2, N = 313