ESTIMATION OF OVERCOVERAGE IN THE CENSUS OF CANADA USING AN AUTOMATED APPROACH. Claude Julien, Statistics Canada Ottawa, Ontario, Canada K1A 0T6

Similar documents
PREDICTING SOUND LEVELS BEHIND BUILDINGS - HOW MANY REFLECTIONS SHOULD I USE? Apex Acoustics Ltd, Gateshead, UK

Smarter Balanced Assessment Consortium Claims, Targets, and Standard Alignment for Math

Adaptive Harmonic IIR Notch Filter with Varying Notch Bandwidth and Convergence Factor

OTC Statistics of High- and Low-Frequency Motions of a Moored Tanker. sensitive to lateral loading such as the SAL5 and

Precise Indoor Localization System For a Mobile Robot Using Auto Calibration Algorithm

A New Localization and Tracking Algorithm for Wireless Sensor Networks Based on Internet of Things

DSI3 Sensor to Master Current Threshold Adaptation for Pattern Recognition

Energy-Efficient Cellular Communications Powered by Smart Grid Technology

Allocation of Multiple Services in Multi-Access Wireless Systems

Parameter Identification of Transfer Functions Using MATLAB

ELEC2202 Communications Engineering Laboratory Frequency Modulation (FM)

SECURITY AND BER PERFORMANCE TRADE-OFF IN WIRELESS COMMUNICATION SYSTEMS APPLICATIONS

Compensated Single-Phase Rectifier

Population Figures. Methodology

Carlson Software Inc. 102 West 2 nd Street Maysville, KY m Phone: (606) Fax: (606)

Worksheet 2.1, Math 455

AN OPTIMAL DESIGN PROCESS FOR AN ADEQUATE PRODUCT?

Relation between C/N Ratio and S/N Ratio

NINTH INTERNATIONAL CONGRESS ON SOUND AND VIBRATION, ICSV9 PASSIVE CONTROL OF LAUNCH NOISE IN ROCKET PAYLOAD BAYS

Yield Enhancement Techniques for 3D Memories by Redundancy Sharing among All Layers

Additive Synthesis, Amplitude Modulation and Frequency Modulation

New Adaptive Linear Combination Structure for Tracking/Estimating Phasor and Frequency of Power System

Kalman Filtering for NLOS Mitigation and Target Tracking in Indoor Wireless Environment

CH 24 SLOPE. rise = run. Ch 24 Slope. Introduction

Packet Loss and Delay Combined Optimization for Satellite Channel Bandwidth Allocation Controls

A NEW APPROACH TO UNGROUNDED FAULT LOCATION IN A THREE-PHASE UNDERGROUND DISTRIBUTION SYSTEM USING COMBINED NEURAL NETWORKS & WAVELET ANALYSIS

Real Time Etch-depth Measurement Using Surface Acoustic Wave Sensor

Analysis on DV-Hop Algorithm and its variants by considering threshold

Optimal Modulation Index of the Mach-Zehnder Modulator in a Coherent Optical OFDM System Employing Digital Predistortion

COMPARISON OF TOKEN HOLDING TIME STRATEGIES FOR A STATIC TOKEN PASSING BUS. M.E. Ulug

POWER QUALITY ASSESSMENT USING TWO STAGE NONLINEAR ESTIMATION NUMERICAL ALGORITHM

ACCURATE DISPLACEMENT MEASUREMENT BASED ON THE FREQUENCY VARIATION MONITORING OF ULTRASONIC SIGNALS

THE IMPLEMENTATION OF THE HARTEBEESTHOEK94 CO-ORDINATE SYSTEM IN SOUTH AFRICA

Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH, 2

Sound recording with the application of microphone arrays

Comparison of Fourier Bessel (FB) and EMD-FB Based Noise Removal Techniques for Underwater Acoustic Signals

EFFECTS OF MASKING ANGLE AND MULTIPATH ON GALILEO PERFORMANCES IN DIFFERENT ENVIRONMENTS

Efficient Non-linear Changed Mel-filter Bank VAD Algorithm

Beacon-driven Leader Based Protocol over a GE Channel for MAC Layer Multicast Error Control

Comparing structural airframe maintenance strategies based on probabilistic estimates of the remaining useful service life

Keywords: International Mobile Telecommunication (IMT) Systems, evaluating the usage of frequency bands, evaluation indicators

Robust Acceleration Control of Electrodynamic Shaker Using µ Synthesis

Distributed Resource Allocation for Proportional Fairness in Multi-Band Wireless Systems

SIG: Signal-Processing

Transmit Power and Bit Allocations for OFDM Systems in a Fading Channel

A New Simple Model for Land Mobile Satellite Channels

LOW COST PRODUCTION PHASE NOISE MEASUREMENTS ON MICROWAVE AND MILLIMETRE WAVE FREQUENCY SOURCES

Overlapping Signal Separation in DPX Spectrum Based on EM Algorithm. Chuandang Liu 1, a, Luxi Lu 1, b

Section 2: Preparing the Sample Overview

FFR SAND C .1- SALVAGING PYROTECHNIC DATA WITH MINOR OVERLOADS AND OFFSETS

Notes on Orthogonal Frequency Division Multiplexing (OFDM)

Fundamental study for measuring microflow with Michelson interferometer enhanced by external random signal

WIPL-D Pro: What is New in v12.0?

Keywords: Equivalent Instantaneous Inductance, Finite Element, Inrush Current.

SAMPLING PERIOD ASSIGNMENT FOR NETWORKED CONTROL SYSTEMS BASED ON THE PLANT OPERATION MODE

ELECTROMAGNETIC COVERAGE CALCULATION IN GIS

Key Words: age-order, last birthday, full roster, full enumeration, rostering, online survey, within-household selection. 1.

VHDL-AMS Behavioral Modeling and Simulation of M-QAM transceiver system

Length, Perimeter and Area

NONLINEAR WAVELET PACKET DENOISING OF IMPULSIVE VIBRATION SIGNALS NIKOLAOS G. NIKOLAOU, IOANNIS A. ANTONIADIS

International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 3, Issue 9, September 2014

APPLICATION OF THE FAN-CHIRP TRANSFORM TO HYBRID SINUSOIDAL+NOISE MODELING OF POLYPHONIC AUDIO

2006 Census Technical Report: Sampling and Weighting

Optical Magnetic Response in a Single Metal Nanobrick. Jianwei Tang, Sailing He, et al.

Windowing High-Resolution ADC Data Part 2

RAKE Receiver. Tommi Heikkilä S Postgraduate Course in Radio Communications, Autumn II.

Boris Krnic Nov 15, ECE 1352F. Phase Noise of VCOs

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICC.2006.

HIGH FREQUENCY LASER BASED ACOUSTIC MICROSCOPY USING A CW GENERATION SOURCE

A Wireless Transmission Technique for Remote Monitoring and Recording System on Power Devices by GPRS Network

A HIGH POWER FACTOR THREE-PHASE RECTIFIER BASED ON ADAPTIVE CURRENT INJECTION APPLYING BUCK CONVERTER

Iterative Receiver Signal Processing for Joint Mitigation of Transmitter and Receiver Phase Noise in OFDM-Based Cognitive Radio Link

A Robust Noise Spectral Estimation Algorithm for Speech Enhancement in Voice Devices

UNIT - II CONTROLLED RECTIFIERS (Line Commutated AC to DC converters) Line Commutated Converter

Radio Resource Management in a Coordinated Cellular Distributed Antenna System By Using Particle Swarm Optimization

Complexity Metrics for Component-based Software Systems

Detection of Faults in Power System Using Wavelet Transform and Independent Component Analysis

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Track-Before-Detect for an Active Towed Array Sonar

ARecent report pointed out that in 2014 the amount of data

Intermediate-Node Initiated Reservation (IIR): A New Signaling Scheme for Wavelength-Routed Networks with Sparse Conversion

Interference Management in LTE Femtocell Systems Using Fractional Frequency Reuse

Part 9: Basic AC Theory

Lab 5: Differential Amplifier.

EQUALIZED ALGORITHM FOR A TRUCK CABIN ACTIVE NOISE CONTROL SYSTEM

Performance Analysis of Atmospheric Field Conjugation Adaptive Arrays

Enhanced Algorithm for MIESM

ABSTRACT 1. INTRODUCTION

Secondary-side-only Simultaneous Power and Efficiency Control in Dynamic Wireless Power Transfer System

Chapter 4: Sampling Design 1

Uplink blocking probability calculation for cellular systems with WCDMA radio interface and finite source population

presented on yfra.,- /4/,'d)

An Automatic Control Strategy of Strip Width in Cold Rolling

TESTING OF ADCS BY FREQUENCY-DOMAIN ANALYSIS IN MULTI-TONE MODE

AccuBridge TOWARDS THE DEVELOPMENT OF A DC CURRENT COMPARATOR RATIO STANDARD

Alternative Encoding Techniques for Digital Loudspeaker Arrays

Record Linkage between the 2006 Census of the Population and the Canadian Mortality Database

Power-Efficient Resource Allocation for MC-NOMA with Statistical Channel State Information

A Novel Control Scheme to Reduce Storage Capacitor of Flyback PFC Converter

PUBLIC EXPENDITURE TRACKING SURVEYS. Sampling. Dr Khangelani Zuma, PhD

Transcription:

ESTMATON OF OVERCOVERAGE N THE CENSUS OF CANADA USNG AN AUTOMATED APPROACH Claude Julien, Statistics Canada Ottawa, Ontario, Canada K1A 0T6 KEY WORDS: Coverage evaluation, two-phase design, stratification 1. ntroduction n any survey or census, errors due to the coverage of the target population ay occur and thus affect the accuracy of the estiates or counts. t is therefore iportant to easure these errors and to study the reasons for their presence. An error due to the oission of a unit of the target population is referred to as undercoverage. Conversely, when a unit is enuerated ore than once or a unit not belonging to the target population is enuerated at least once, there is said to be overcoverage. As part of the 1991 Census of Canada Coverage Error Measureent Progra, an Overcoverage study (see Dibbs and Royce 1990) is currently being developed. This study uses several ethods to detect and estiate different types of overcoverage: 1. a post-censal survey of households to identify fictitious and out-of-scope persons, and to collect additional addresses where persons ay have been double-counted; 2. a survey of usual residents in collective dwellings to identify those who are also enuerated in a private or another collective dwelling; 3. an Autoated Match Study to identify overcoverage caused by errors occurring during the census data collection operation. This paper describes the Autoated Match Study (AMS). The next two sections present the objectives of the AMS and how it cobines autoated and clerical operations to fulfil the. The last two sections describe and evaluate three saple designs that cobine both operations to produce an estiate of total within EA overcoverage that is as precise and exact as possible. 2. Objectives of the AMS The Canadian Census of Population is conducted by dividing the country into approxiately 45,000 Enueration Areas (EA). n general, an EA is an area for which one Census Representative (CR) is responsible. The CR is given a ap of the EA and is required to identify and list all the dwellings in the Visitation Record (VR). n the ajority of EAs, the CR leaves a census questionnaire at each household to be copleted by one of the residents and ailed back on Census Day. Households that do not ail back their questionnaire are followed up by telephone or by personal visit two or three weeks after Census Day. During the Census data collection operation, soe households can be enuerated ore than once. For exaple, a household ight ail back its questionnaire a few days late and then coplete another during telephone follow-up. Duplication ay also occur when a CR unknowingly drops off two questionnaires at the sae dwelling. n this case, the residents of the dwelling ight coplete and return both questionnaires, or coplete one and answer the other during subsequent follow-up operations. Within EA overcoverage occurs when the CR fails to detect the duplication. Although the questionnaires and the VR pass several quality checks, the overcoverage can be left undetected and therefore soe persons are present ore than once on the census database fro which counts are tabulated. The objectives of the AMS are (a) to detect within EA overcoverage as efficiently and effectively as possible, and (b) to estiate total within EA overcoverage as precisely and exactly as possible. 3. Autoated and clerical operations 3.1 Description n the Canadian Census of population, the naes and addresses of respondents are not captured. n this context, anually searching all questionnaires in an EA for doublecounting is a costly, tedious and error-prone operation. A totally autoated approach is not feasible either. The AMS approach cobines both strategies. n the first step, a coputer progra extracts inforation fro the census database and reports pairs of households that are siilar enough to possibly include coon persons, ie. overcoverage. n the second step, the census questionnaires copleted by these households are verified by a clerk who reports the presence or absence of overcoverage. Within EA overcoverage is ore likely to occur aong siilar households enuerated in the sae neighbourhood. n the AMS, siilarity and proxiity of two households is deterined by a specially designed coputer progra. This progra copares the sex and the date of birth of the household ebers and produces the following statistics: the size of each household, the nuber of siilar persons and the proxiity of the households. These statistics are used to classify a pair of households according to the likelihood that it contains overcoverage. For exaple, a pair of four-person households with four siilar persons is put into a high likelihood class, whereas a pair of four-person households with only one siilar person is put into a low likelihood class. The coparison is done for each pair of households in an F_~ The average EA contains 300 households and yields 44,850 coparisons. t is therefore ipractical to anually verify each pair. Depending on the class, all or soe pairs are selected and printed on a for. The characteristics of all the ebers are printed side by side for each household. n the verification operation, clerks are assigned to look at the census questionnaire for each household and to indicate on the for which persons are double-counted. 3.2 Feasibility of the AMS Using 1986 Census data, we evaluated the feasibility of the AMS ethodology (see Julien, 1991). n the study, we carried out the autoated atching operation in 380 EAs. Two persons fro different households were siilar when both had the sae sex, onth of birth and year of birth (the day of birth was not used because it was not available on the database). Two households were considered to be in the sae neighbourhood when their household nubers differed by five or less. A household nuber is given by the Census Representative when canvassing the EA. Table 1 gives the 431

average nuber of pairs per EA for each class. n 40 of the 380 EAs, all pairs in classes 1 to 5 were verified. Table 1 provides the incidence of overcoverage per class ( a), defined as the ratio of the nuber of pairs with overcoverage divided by the nuber of pairs verified. The results reveal that neighbouring households with ore than one siilar person are alost all cases of overcoverage. n the other classes, the incidence of overcoverage varies between 1% and 50 %. The fact that the incidence of overcoverage varies substantially aong the classes deonstrates the efficiency of the autoated operation. The feasibility study also identified two weaknesses of the AMS ethodology; verification in large classes and response errors. Pairs in classes 6 and 7 were not verified because they contain too any pairs of households. The incidence of overcoverage is expected to be very low and very any pairs would have to be verified in order to observe just one case of overcoverage. This proble can be handled by ignoring these classes totally, and tolerating a slight underestiate of total within EA overcoverage, or by verifying a saple of pairs, and obtaining an unbiased yet potentially iprecise estiate. The AMS ethodology relies on the assuption that persons enuerated ore than once present siilar characteristics in each enueration, ie. the sae sex and date of birth is reported. The study showed that 15 % of the persons enuerated ore than once had a different date of birth reported. Consequently, it is expected that a few cases of overcoverage fall in class 6 or 7, instead of falling in classes 1 to 5 where they would be easier to detect. t is worth noting that the AMS is carried out at the household level. A pair of households that contains overcoverage will present no siilarity, and thus fall into class 7, only if all overcovered persons have their date of birth reported erroneously. Fortunately, the chance that such a situation occurs decreases quickly as the nuber of overcovered persons increases. One potential iproveent is to relax the criteria used to deterine siilar persons at the expense of increasing the nuber of pairs in classes 1 to 6. Unfortunately, this leads to the first weakness of dealing with larger classes. 4. Three saple designs The ultiate goal of the AMS is to produce an unbiased estiate of total within EA overcoverage with a specified level of precision by allocating the available resources between the cheap autoated atching operation and the expensive clerical verification operation. n order to obtain a reasonable level of precision with a sensible aount of resources we decided to exclude the unlikely pairs falling into classes 6 and 7 fro the verification operation. Consequently the target population is not copletely covered, but the bias is expected to be sall (between 1% and 5 %). n this section, we copare three saple designs: a siple rando saple design (SRS), a two-phase design using a stratified estiator (TP_STR) and a two-phase design using a ratio estiator (''P_RAT). The notation and forulas eployed hereafter are described in the Appendix. n the SRS, all EAs that are selected for the autoated atching operation are also selected for the anual verification operation. The overcoverage observed is siply ultiplied by the inverse of the sapling rate to produce an unbiased estiate. Since the verification operation is expensive and tie consuing, the nuber of EAs verified will be sall. Given the rareness of overcoverage, the SRS with a sall saple is expected to yield a very iprecise estiate. Since it is uch cheaper to process an EA through the atching operation than it is to verify the resulting pairs, the rationale of the two-phase approach is to verify fewer F.As than by the SRS ethod and to use the extra resources available to atch a uch larger first phase saple of F.As. The results of the first phase saple are then used as auxiliary inforation to obtain ore precise estiates than the SRS approach. The TP STR approach utilizes the results of the first phase saple to distinguish two or ore strata of E.As in which the proportion of overcoverage differ greatly. The results are also used to estiate the stratu weights. The second phase saple of EAs is verified to estiate the average nuber of overcovered persons per EA in each stratu. The estiated weights and averages are cobined to produce an unbiased estiate that is expected to be ore precise than the SRS estiate. The stratification provides a better use of the resources available for verification, ie. it enables the disproportionate allocation of the second phase saple. For exaple, the strata consisting of F.As with highly likely pairs would be allocated a relatively bigger share of the second phase saple. An alternative approach is to consider the population of all within E.A pairs of households and to estiate the total within EA overcoverage in each class ( Y o). A difficulty arises because the size of the population in each class, MC the nuber of pairs, is unknown. A two-phase design with a ratio estiator offers an attractive solution. n this approach the results fro the first phase saple are used to estiate the M o. A second phase saple of F_As is verified to estiate the average nuber of overcovered persons per pair in the c th class. This is done by using the ratio estiator ~c / ~c. The estiated sizes and averages are cobined to produce an estiate that is expected to be ore precise than the SRS estiate. However, it is also expected to be biased because of the use of the ratio estiator. 5. Evaluation of the saple designs n this section, we describe a siulation study in which the three saple designs were copared to evaluate (a) how they perfor with such a rare population, (b) the gains of the two-phase approaches and (c) the bias incurred by the TP RAT approach. 5.1 The population As entioned in section 3, 1986 Census data fro 380 EAs were processed through the coputer atching progra. The pairs produced were classified into the seven classes shown in Table 1. A anual verification of pairs in classes 1 to 5 was carried out for 40 of the 380 F_As to deterine the incidence of overcoverage in each class( c), presented in Table 1. Using the results of the verification, the presence of overcoverage was siulated for the 340 F_As that were not verified. A rando nuber between 0 and 1, ~, was o j generated for each j,h pair of the i ~ EA in classes 1 to 5. 432

When ~<1 o, overcoverage was deterined and the nuber of overcovered persons for that pair, ~//, was set to the size J of the sallest household of the pair; otherwise, no overcoverage was deterined and.y///was set to 0. The 40 - # EAs that were verified plus the 340 EAs that were siulated ade up a population of ore than 233,000 persons of which 314 were overcovered. More inforation on this population is given in Table 2. 5.2 The saple selection Fro this population, an initial saple of n EAs was selected. This saple was used as a first-phase saple for the TP STR and TP RAT ethods. For the TP STR ethod, the nuber of pairs in each class for each EA (M~, M~/... MT/), was used to divide the selected saple into 3 strata: stratu 1 was all EAs with at least one pair of households in CLASS 1 (M1/> 0), stratu 2 was EAs with no pair in CLASS 1 but at least one pair in CLASS 2 (M] = 0 ; ~ > 0), and stratu 3 was all other EAs. Using the population statistics presented in Table 2, optial values of the second-phase sapling fractions v h (see Cochran, 1977, p. 331), were calculated under the assuption that the cost of verification is the sae for each stratu and 10 ties higher than the cost of the autoated atching operation. These sapling fractions were applied to the nuber of EAs observed in each stratu (n11, n~2, n~) to obtain the second-phase saple size that was selected fro each stratu (n~ = v h nlh). For the TP_RAT ethod, the su of the n h gave the second-phase saple size that was selected fro the initial saple. n order to copare designs that are cost-equivalent, cobining the autoated and clerical operations, the SRS saple size was set to nsp.s = (n ~ + 10 ntp ) 11 roundedto the nearest integer. The SRS saple was selected fro the initial saple independently fro both second phase saples and thus was equivalent to a rando saple taken fro the whole population. 5.3 Presentation of the results The selection ethod described in section 5.2 was carried out 300 ties each for first-phase saple sizes of 80, 120, 160 and 200 E/ks. With each siulation, ~'k the estiate of the nuber of overcovered persons and c~>a) the estiate of the coefficient of variation were calculated for each saple design (k = 1,2,3). The results of the siulation are given in Table 3. For each design the average of the 300 ~'k and c~y~ are provided, as well as the actual coverage rate of the 95 % confidence interval. The latter statistic was calculated by coputing, for each siulation, the 95 % confidence interval estiated by each option and counting the proportion of the intervals actually covering the true population value (314). This statistic is expected to be close to.95 for the unbiased SRS and the TP STR designs. t should also point out any significant bias resulting fro the TP_RAT. Furtherore, to show the direction of the bias Table 3 also gives the proportion of confidence intervals that are too low (underestiation) and too high (overestiation). The 300 estiates of total within EA overcoverage produced by each design averaged close to the population value of 314. The TP STR estiate averaged closest to the population value with all saple sizes. The TP_STR also yielded the ost precise estiate. ts average estiated coefficient of variation (cv) was at least 34 % lower than the cv of the SRS estiate and at least 12 % lower than that of the TP RAT estiate. However, one has to keep in ind that the TP STR design was evaluated under optial conditions, n ie. the optial second-phase sapling rates were known. The observed coverage rate of the SRS and TP STR designs are very siilar and slightly lower than the expected 95 % coverage rate. This ight be caused by the rareness of the population, shown by the highly skewed distribution of Y/ given in Table 2. The coverage rate of the TP_RAT design is uch lower, especially with sall saples. This indicates that the ratio design tends to underestiate the variance. This design also produced five ties ore cases of overestiation than the other two designs. 6. Conclusion The objectives of the AMS are (a) to detect overcoverage occurring within an Enueration Area as efficiently and effectively as possible, and (b) to estiate total within EA overcoverage as precisely and exactly as possible. The first objective was et by cobining an autoated atching operation with a clerical verification operation. The forer classifies pairs of households according to the likelihood that they contain overcoverage. The latter reports the presence or absence of overcoverage by verifying the census questionnaire copleted by the ost "suspicious" pairs. This ethod is very effective in that it isolates ost of the overcoverage in a few relatively sall classes of pairs of households. However, in order to be efficient the pairs falling in the largest and least likely classes ust be ignored, ie. excluded fro the verification operation. Consequently the target population is not copletely covered, although the bias is expected to be sall. The second objective was et by coparing three saple designs in a siulation study. A two-phase design with a stratified estiator was the best of the three options. n this design, a large first phase saple of EAs is processed through the autoated atching operation. Using the results fro this operation, the EAs are stratified according to the likelihood that they contain overcoverage. Disproportionate sapling is then applied in the second phase. A bigger share of the second phase saple is allocated to the strata of EAs that are ore likely to contain overcoverage. The second phase saple of EAs is verified to estiate the average nuber of overcovered persons per EA in each stratu. These averages are cobined with the estiated stratu weights to yield an unbiased estiate. The AMS will be carried out soetie between Noveber 1991 and April 1992. To get an idea of the nuber of EAs to process through the autoated atching and clerical verification operations, we assued that the statistics presented in Table 2 applied to the population of 45,000 F_As and calculated the first and second phase saple sizes required to achieve a specified level of precision. The results are shown in Figure 1. To obtain a coefficient of variation of 10 % we need to atch a first phase saple of 788 EAs and verify a second phase saple of 175 of the. A cv of 5 % 433

requires a first phase saple of 3059 EAs and a second phase saple of 681 EAs. Currently another siulation is under way to ipleent the saple design at the province level and to estiate the optial second-phase sapling rates. ACKNOWLEGMENT The author would like to ackowledge Don Royce, Dave Dolson and Ruth Dibbs for their useful coents; and Laurie Reedan for her technical and prograing assistance. REFERENCES Cochran, W.G., (1977). Sapling Techniques. New York, John Wiley & Sons. Dibbs, R. and Royce, D. (1990), "Measuring Overcoverage in the 1991 Census of Canada", Proceedings of the Governent Statistics Section, Aerican Statistical Association, 24-27. Julien, C. (1991), "Assessing the Feasibility of an Autoated Match Study to Estiate Overcoverage in the Census", Technical Report, Social Survey Methods Division, Statistics Canada. Kovar, J., Ghangurde, P., Gerain, M.-F., Lee, H., and Gray, G. (1985), "Variance Estiation in Saple Surveys', Methodology Branch Working Paper No. BSMD 85-049E, Statistics Canada. APPENDX: Notation and forulas for the estiate of total within EA overcoverage and the estiate of variance Let N denote the size of the EA population, n / the size of the first phase saple, n the size of the second phase saple; let M denote the nuber of pairs and y represent the nuber of overcovered persons; let i, c, J and h respectively denote the EA, the class, the pair and the stratu. N c -i N Y = ~E~ ~ ~ y~# = ~ Y is the total within EA overcoverage /,, ==,/,, /,, Siple rando sapling, - 1 1 s2(y) where s 2 i-1 = Ny= N /-' ' v(~) = N =(n -'N) ' (Y) = n n-1 % Two-phase stratification, s 3 nlh ~'~ ~.~ h.1 n / nh v(y2) = N2 [~E~ h-1 nh h-1 N ( N-n ) (N-1)n h-1 /v w~ (~-T.) ], n Z; (Y,, - ;,,)' wheres2h(y) = /-1 nh-1 Two-phase ratio, = ~ N' o "~c= ~ (N /-' )(/-' ) ==1 n / " /-1 v(~) = N = [(n! - 1) s=(y) 1 1) 2 n + (n- n/ s (d)], n ~ (Y, - ~)= o 434

' t,t) CD N, D (n CL E (U co FGURE 1 First and second saple sizes required to achieve target CV 5000 4500 4000 3500 3000-]- 2500-- 2000-1500 1000~,,~ 500-- 0 4%! 6% 8% 10% 12% 14% 5% 7% 9% 11% 13% Target CV = firstsaple size second saple size TABLE 1. CLASSFCATON AND VERFCATON OF PARS OF HOUSEHOLDS CLASS (C) NUMBER OF PARS PER EA (r~ o) NCDENCE OF OVERCOVERAGE ( ) 1 MORE THAN ONE SMLAR PERSON SAME NEGHBOURHOOD i i 2 MORE THAN ONE SMLAR PERSON! OUTSDE NEGHBOURHOOD, SMLAR SNGLE-PERSON HHLD SAME NEGHBOURHOOD 0.10 0.94 0.36 0.35 0.13 0.50 SMLAR SNGLE-PERSON HHLD OUTSDE NEGHBOURHOOD ONLY ONE SMLAR PERSON SAME NEGHBOURHOOD ONLY ONE SMLAR PERSON OUTSDE NEGHBOURHOOD 7 ] NO SMLAR PERSON 1.85 1.50 146 30646 0.02 0 (-) 435

TABLE 2. CHARACTERSTCS OF THE POPULATON USED FOR SMULATNG THE THREE SAMPLE DESGNS 2.1 Frequency distribution of within F_A overcoverage 2.2 Population statistics Stratu level Class level Statistic h = 1 h = 2 h = 3 All Statistic c- 1 c = 2 c = 3 c = 4 N 31 77 272 380 c=5 M 40 138 50 705 572 y 141 142 31 314 S=(.y) 5.922 3.949 0.190 3.104 v 0.6 0.5 0.1 y 127 137 22 17 11 0.93 0.35 0.44 0.02 0 S=(o) = 1.051 TABLE 3. RESULTS OF THE SMULATON OF THE THREE SAMPLE DESGNS nitial saple of STATSTC n/ = 80 SRS TP STR TP RAT n / = 120 s.sl TP_sT. TP RAT average saple (n) 24.5 19 36 28 average estiate 306 318 302 321 315 306 average esti, coeff, of var 43.9 % 28.9 % 34.1% 34.3 % 22.8 % 26.5 % observed coverage rate.90.90.78.94.92.82 confidence interval too low.09.09.17.06.07.13 confidence interval too high.05.00.05 nitial saple of n = 160 STATSTC SRS TP_STR TP_RAT SRS n = 200 TP_ST" TP RAT average saple (n) 48 37 59 45 average estiate 313 314 320 308 311 318 average esti, coeff, var. 29.4 % 18.8 % 21.7 % 25.8 % 16.2 % 18.5 % observed coverage rate.91.93.87.91.92.89 confidence interval too low.08.06.09.08.07.07 confidence interval too high.04.04 436