Evaluation of the Canadian Census Editing and Imputation System

Size: px
Start display at page:

Download "Evaluation of the Canadian Census Editing and Imputation System"

Transcription

1 Evaluation of the Canadian Census Editing and Imputation System Christine Bycroft and Allyson Seyb Survey Methods, Christchurch February 2004

2 Acknowledgement This report was prepared by the Survey Methods Division, and published by the Information and Publishing Services Division of Statistics New Zealand. Further information For further information on this report, or on other reports or products, contact our Information Centre. Visit our website: Or us at: Or phone toll free: Auckland Wellington Christchurch Private Bag PO Box 2922 Private Bag 4741 Phone Phone Phone Fax Fax Fax Information Centre Your gateway to Statistics New Zealand Each year, we collect over 60 million pieces of information. New Zealanders tell us how and where they live; about their work, spending and recreation. We also collect a complete picture of business in New Zealand. This valuable resource is yours to use. But with all the sophisticated options available, finding exactly what you need can sometimes be a problem. Giving you the answers Our customer services staff provide the answer. They are the people who know what information is available, and how it can be used to your best advantage. Think of them as your guides to Statistics New Zealand. They operate a free enquiry service where answers can be quickly provided from published material. More extensive answers and customised solutions will incur costs, but we always give you a free no-obligation quote before going ahead. Liability statement Statistics New Zealand gives no warranty that the information or data supplied contains no errors. However, all care and diligence has been used in processing, analysing and extracting the information. Statistics New Zealand shall not be liable for any loss or damage suffered by the customer consequent upon the use directly, or indirectly, of the information supplied in this product. Reproduction of material Any table or other material published in this report may be reproduced and published without further licence, provided that it does not purport to be published under government authority and that acknowledgement is made of this source.

3 Contents Summary Introduction Background Summary of main results Comparison of SNZ and CANCEIS Approaches to Editing and Imputation SNZ census E & I Canadian census E & I Summary of differences between SNZ and CANCEIS Parts of SNZ's Editing and Imputation System that Would Be Replaced by CANCEIS SNZ demographic edits SNZ imputations CANCEIS edits and imputations Methodology for the Evaluation of CANCEIS Aims Test data Quality indicators Results Introduced errors Editing and imputation accuracy for modified data Computing Environment Further Work More test data Edits Family coding Other issues Appendix 1 SNZ Edits that would be Covered by CANCEIS...16 Appendix 2 Canadian Consistency Rules...17 Appendix 3 Summary of Quality Management Strategy for the 2001 Census...19 Appendix 4 Measurement of the Accuracy of the Editing and Imputation Processes (ESSE User Manual)...19 Appendix 5 Tables of Results...21 References...24 iii

4

5 Summary This research project aimed to evaluate CANCEIS, the editing and imputation system used by Statistics Canada for census, and to recommend whether Statistics New Zealand should consider implementing the Canadian system in the next New Zealand census in CANCEIS uses a nearest-neighbour imputation methodology based on the minimum change donor system of Fellegi and Holt. This is a more sophisticated editing and imputation system than Statistics New Zealand has the resources to produce. It appears to offer the potential to produce higher quality outputs without operator intervention for demographic variables, and may also allow family coding to be fully automated. This report compares CANCEIS with the current Statistics New Zealand approach to census editing and imputation, and evaluates the performance of CANCEIS for some simple households and for a limited range of demographic variables. 1

6 1 Introduction 1.1 Background This research project aimed to evaluate CANCEIS, the editing and imputation system used by Statistics Canada for their census, and to recommend whether Statistics New Zealand (SNZ) should consider implementing the Canadian system in the next New Zealand census in Investigations reported here were completed in February 2002, and a recommendation was made that the project to continue testing CANCEIS should proceed, with the aim of implementing the system in New Zealand's 2006 Census. However, due to census constraints, no further work has been done at the time of writing. CANCEIS uses a nearest-neighbour imputation methodology based on the minimum change donor system of Fellegi and Holt. This is a more sophisticated editing and imputation (E & I) system than SNZ has the resources to produce. It appears to offer the potential to produce higher quality outputs without operator intervention for demographic variables, and may also allow family coding 1 to be fully automated. CANCEIS performs E & I for a wide range of variables. However, because this was a small research project with a limited budget, only the demographic variables of age, sex, relationship to reference person, legal marital status and social marital status were considered. This report compares CANCEIS with the current SNZ approach to census E & I, and evaluates the performance of the editing and imputation carried out by CANCEIS on some simple households. 1.2 Summary of main results These conclusions are based on our understanding of how CANCEIS works, and results of testing a limited set of two-person and four-person households from the provisional 2001 New Zealand Census of Population and Dwellings file. CANCEIS produces high quality outputs for the variables of interest in this study - age, sex, relationship to reference person, legal marital status and social marital status,-- which are all consistent within a household, removing all non-response. It does this with an efficient automatic system based on a complete set of edits and no operator intervention. This system could replace the current manual operator resolution of a limited set of edits. We expect that this would reduce census processing time, but would require an investment in skilled staff for a longer development time than normal. In brief: The editing carried out in CANCEIS is far more comprehensive than currently attempted by SNZ. The edits and imputations for the five variables are integrated in one module. CANCEIS is easy to run and produces comprehensive reports. The rate of error detection and imputation of the correct value is high for all the variables tested except age. This was assessed by quality indicators which we derived for the editing 1. Family codes define the family and household structure and are derived from relationship and living arrangement variables. 2

7 and imputation done by CANCEIS, for typical census errors and at error rates similar to those found in census. However, note that, in testing, ages within 10 percent of the true age only were accepted as correct; relaxing this rather strict criteria would improve the rate of error detection and imputation of the correct value for age. It requires some effort to understand the default edit tables (DLTs), but they are easy to change to suit different user requirements. Detailed further testing is recommended, particularly with respect to understanding the CANCEIS default edits and identifying changes needed. The remainder of the paper is divided into sections as follows. Section 2 compares the Canadian and SNZ approaches to E & I and section 3 outlines the parts of the current SNZ census E & I system that would be replaced by CANCEIS. In section 4 we describe how we undertook testing of CANCEIS and the results of this testing are given in section 5. Section 6 outlines further work that would be needed if the project were to continue. 2 Comparison of SNZ and CANCEIS Approaches to Editing and Imputation In population census data, the surveyed statistical unit has a hierarchical structure: the data are collected at the household level with information for each person within the household. The information collected contains errors and missing values. Editing and imputation techniques are used to detect and correct these errors in order to improve the quality of the information. An important aspect of editing and imputing hierarchical data is preserving relationships between persons in a household, as well as preserving relationships between variables for a particular person. 2.1 SNZ census E & I The editing philosophy for the 2001 Census of Population and Dwellings is driven by: a desire to minimise the number of complex edits, and the Quality Management Strategy (QMS), which focuses on maintaining respondents' intentions and providing output data that is 'fit for use' rather than 'sanitised'. The SNZ editing system for the 2001 Census consisted of two phases: a series of micro-edits run sequentially during processing, and macro-edits, which are checks of aggregate output data. The number of micro-edits has been substantially reduced since the 1996 Census and more emphasis has been placed on macro-edits. Since SNZ does not have the resources to produce all outputs at a high level, the QMS places all output variables in one of three quality categories: foremost, defining or supplementary. Every effort is made to have the foremost variables of the highest possible quality, while more errors and inconsistencies will remain for variables in the other categories. Of the variables being considered in this project, age and sex are foremost variables, and the remainder relationship to reference person, social and legal marital status are classed as defining variables. Operator intervention for edit failures is restricted to edits involving foremost and defining variables. Resolution of micro-edits is limited to the correction of recognition errors, using a priority ruling for multiple responses in some variables and setting some variables to 'response unidentifiable' in cases where inconsistencies are not allowed. Apparent inconsistencies between variables may be left in the data if the respondent's statement is clear. 3

8 Where macro-edits (run on aggregate data) reveal errors, households have been re-processed. Five variables are imputed for the 2001 Census: sex, age, usual residence, labour force status and Mäori descent. 2 Each variable has its own imputation method. Imputation is only carried out where the respondent has supplied an invalid response, not where a valid response is inconsistent with another variable. This project is concerned only with age and sex imputation. Micro-edits and imputations are run individually and sequentially. It is not possible to go backwards and alter the outcome of a previous edit in the light of subsequent edits. The order of edits and imputation is critical. Family coding and age imputation are carried out last. Almost all of the SNZ edits are within-person edits. There is a very small number of betweenperson edits involved in the derivation of family codes. 2.2 Canadian census E & I In preparation for New Zealand's 2006 Census, this study aims to analyse a different approach to editing and imputation used by Statistics Canada. The Canadian edit and imputation system used for their 2001 Census, CANCEIS, is thought to be the best methodology to automatically handle hierarchical demographic data. CANCEIS is already in use in the Brazilian and Italian censuses and also in the Italian Labour Force Survey. Also, the national statistical agencies in Spain and Switzerland are reviewing the Canadian methodology to determine if its use would be appropriate in their next censuses. CANCEIS is a more generic version of the 1996 Canadian system, NIM. We are, however, evaluating only the part of CANCEIS that performs E & I for the demographic variables, and which remains essentially the same as NIM. The Canadian E & I system is a minimum change donor system based on the approach proposed by Fellegi and Holt (1976). Fellegi and Holt's method states that a record should be made to satisfy all edit checks by changing the fewest possible number of variables. NIM imputes for nonresponse and resolves inconsistent responses for the variables age, sex, legal marital status, social marital status and relationship for all persons in a household simultaneously. These demographic variables were successfully processed for the 1996 Canadian Census for over 11 million households in one month with up to 5,000 edit rules applied simultaneously (Bankier, 2000). The main difference between NIM and Fellegi/Holt is that NIM searches for nearest neighbours first and then determines the minimum change imputation action based on these donors, whereas Fellegi/Holt searches for donors after deciding what changes are needed. This implementation of NIM allowed, for the first time, the simultaneous hot deck imputation of qualitative and quantitative variables for large E & I problems (Bankier, 2000). NIM was designed to process qualitative variables with quantitative variables treated as a special case. For 2001, its successor CANCEIS treats all variables, including qualitative variables, as quantitative. As a result, CANCEIS is able to efficiently perform E & I with many quantitative variables simultaneously (Bankier, 2000). CANCEIS is a very flexible system that uses edit and imputation rules supplied by the user. The edits are defined in groups called decision logic tables (DLTs). The default edits supplied with CANCEIS are easily altered to suit different requirements. This means that users can tailor the 2. Imputed values for Mäori descent are produced for the Electoral Commission but are not included in the Census file. 4

9 software to their own situation. In addition, the software checks for redundancies and logical inconsistencies of the edit rules. It allows simultaneous editing and imputation of a variety of variables for large datasets. The software uses a hot deck imputation methodology (Bankier, 1999). The objectives of the methodology are: the imputed household should closely resemble the 'failed edit' household the imputed data for a household should come from a single donor household the imputed household should closely resemble the donor household equally good imputation actions, based on the available donors, should have a similar chance of being selected to avoid falsely inflating small but important groups in the population. In the edit stage, CANCEIS examines the data and determines if a record is complete and consistent by applying edits (or consistency rules) provided by the user via DLTs. If any of these rules are violated, then the record fails and will later be put through an imputation stage to correct the data. Records pass if no conflicts are encountered and the record is complete. The data in these passing records will be used in the imputation stage to complete or correct the failed records. Any passing record that is used to correct a failed record is called a donor. In the imputation stage, a search is made for donor records that resemble a failed record and uses data from those donor records to correct the failed record such that the minimum possible number of fields are changed. We note that SNZ has experience with hot deck imputation having used it successfully for the imputation of work and labour force status for New Zealand's 2001 Census. 2.3 Summary of differences between SNZ and CANCEIS CANCEIS contrasts with SNZ's E & I system in many respects: CANCEIS has a complete set of micro-edits for all sizes of households, while SNZ limits the number of micro-edits to the most important. CANCEIS edits are specified in groups, called decision logic tables, while each SNZ edit is specified separately. CANCEIS assesses and resolves all edits for all people in the household simultaneously. SNZ edits and imputations are run sequentially, where the order in which they are run is vital. CANCEIS produces consistent responses (removing non-response) for all five variables. SNZ imputes only for age and sex, and only for non-response and clearly identified respondent error. Other inconsistencies can result in relationship and marital status being set to not specified, or may be allowed to remain. SNZ does not impute for relationship to reference person, social marital status or legal marital status. CANCEIS imputation uses a hot deck donor household, while SNZ imputation uses known probability distributions (cold deck imputation). CANCEIS is fully automated, while at SNZ, failed edits for these five variables are assessed by operators. Both SNZ and CANCEIS aim for minimum change to responses. CANCEIS does this automatically, assessing all relevant responses within a household, while SNZ limits the number of errors and fields for which changes will be made. 5

10 3 Parts of SNZ's Editing and Imputation System that Would Be Replaced by CANCEIS 3.1 SNZ demographic edits The SNZ 2001 Census micro-edits that would be replaced by CANCEIS are summarised below (a full list is given in appendix 1). These are the edits which affect the variables age, sex, relationship to reference person, legal marital status and social marital status (which is derived from legal marital status and the living arrangements variable). They are split into two groups: within-person edits and between-person edits. Most of the between-person edits occur during family coding where, in 2001, age had already been set and could not be changed. There are only 24 edits. Within-person edits The within-person edits check for: - blanks and invalid responses including multiple response - inconsistencies between age and other variables - inconsistent responses for relationship. Between-person edits The between-person edits check for: - inconsistent relationships - inconsistencies between age and family coding - inconsistencies between 'living with' fields and family coding. All edit failures are initially sent to an operator who corrects any recognition errors. The result is either the respondent's intended response (even though it may be inconsistent) or set to 'unidentifiable'. Only age and sex are then imputed. 3.2 SNZ imputations CANCEIS would replace the current SNZ sex and age imputations. Both age and sex are imputed for all individuals including overseas visitors and substitutes, 3 as well as for absentees. Sex may be imputed deterministically, either manually by an operator based on name, or automatically if the person is part of a couple or is a member of a single sex non-private dwelling. Otherwise sex is imputed stochastically, mainly assuming that 49 percent are male. The age imputation consists of two parts, both run automatically without operator intervention. The first part is deterministic if the variable 'Dwelling Form age' (DFage) is available then it is copied to the age field. The second is stochastic and based on known age distributions from the previous census (ie cold deck imputation). Minimum and maximum ages are set using dwelling type, various personal fields and the person's role in the household (eg parent). Age is then imputed based on the age of a relative, if possible, or otherwise using given general distributions. 3. A substitute (or dummy) form is created by SNZ when there is sufficient evidence to believe that a dwelling or person exists, but no dwelling form or individual form has been received. 6

11 3.3 CANCEIS edits and imputations In contrast to the minimal set of SNZ micro-edits, CANCEIS has a complete set of edits or 'consistency rules' to achieve consistency between all five variables. These are organised by household size, and may number in the hundreds for larger households; however, the algorithms used mean that the edits run very efficiently. They are also grouped into within-person and between-person edits. A summary of the types of edits can be found in appendix 2. So, in addition to replacing the SNZ edits and imputations outlined above, CANCEIS has a great many more edits, and imputes for relationship, legal marital status and common-law status. All resolution of edit failures in CANCEIS is automatic, and could replace all the SNZ manual operator intervention for edit failures after recognition. We assume that this would speed up census processing time, but no assessment has been made of how much difference CANCEIS would make. Statistics Canada also subscribes to a minimum intervention policy for their census, which we assume does not conflict with CANCEIS. 4 Methodology for the Evaluation of CANCEIS 4.1 Aims We have viewed this project as a preliminary study with a small budget and have thus kept a narrow focus. Our aims have been to: gain a basic understanding of how CANCEIS works learn to use CANCEIS assess various qualitative factors such as ease of use provide quantitative indicators of the accuracy of CANCEIS editing and imputation compare CANCEIS with SNZ's E & I. We have deliberately neither attempted to look at the edits in detail (having used the default edits and default values for other parameters in all the testing), nor have we attempted any detailed timing or costing. There is a module that can be run before CANCEIS to identify potential couples which improves the accuracy of the imputations. This has also been excluded at this stage. A simulation study was carried out to evaluate the performance of the software (Seyb, 2002). The main points from this study are reported here. 4.2 Test data The study was carried out using two sets of provisional data from New Zealand s 2001 Census, representing two different household sizes. The data was drawn at different times during census processing and from different regions: 1,000 two-person households from the Canterbury region drawn at the start of processing, and 1,000 four-person households from the Auckland region drawn after processing was complete. Note that the number of households is a limitation of the prototype tested. At the beginning of the project, only Canterbury census data was available and we felt that, in the first instance, it would be best to use two-person households, which are simple in structure. For the second test we wanted more complex households from a more diverse population and chose four-person households from Auckland. 7

12 Only private dwellings were included in the test data. Any household with imputed values for sex or age (including substitutes) was excluded, as we were aiming for a dataset consisting entirely of real values. Households with absentees were also excluded. Demographic variables used for the test are sex, age, legal marital status, common-law status and relationship to reference person. There is a one to one correspondence between the New Zealand and Canadian variables age, sex and legal marital status. The other variables were tailored to fit the Canadian definitions. Common-law status has only two values: 'yes' for living with a partner but not legally married, or 'no'; these are derived from the New Zealand variables legal marital status and social marital status. A concordance was created between the New Zealand and Canadian relationship to reference person variables. The household composition of the test datasets is similar to the national for two- and four-person households. The largest groups for the two-person households are 'couple only' (72 percent compared with 73 percent nationally) and 'couple with children' for the four-person households (74 percent compared with 71 percent nationally). The two original census datasets were put through CANCEIS which imputed for any records which failed the Canadian edits. This data is referred to as the true data. So we are saying that data that has come through the SNZ system with no imputations and has passed all CANCEIS edits is entirely correct. 4.3 Quality indicators The Italian Statistical Agency (ISTAT) software ESSE (Editing Systems Standard Evaluation) was used to provide quantitative quality indicators for the CANCEIS system. The software is based on the artificial introduction of errors in a set of true data by a controlled generation process (Luzi and Della Rocca, 1998). The quality is measured in terms of the system's ability to detect as many errors as possible, and restore the true values, without introducing new errors (Barcaroli et al, 2001). We get indicators of the quality of CANCEIS editing and imputation by taking data that we consider to be real and true, introducing errors into this true data to produce so-called modified data, and then measuring how good CANCEIS is at recognising these errors and fixing them. This approach to the evaluation of an E & I system is appropriate when the aim is to produce the 'best' unit record dataset, as we do for census. Errors can be generated using the MCAR model (missing completely at random), which assumes that the probability that an item is missing does not depend on the value of the item or on observed data of other variables. For the purposes of this study, item non-response in all variables is assumed to be MCAR. ESSE allows for the introduction of several different types of error. Models including item nonresponse (the true value is randomly replaced by a missing value), interchange errors (the true value is replaced by a wrong one in the domain) and outliers in numeric variables were studied. These models best represent the types of errors observed in population census data which has been scanned. The user can choose the error rate independently both for each error model above and each variable. Errors are introduced into any or all of the demographic variables at a rate specified by the user. Initially, a variety of error rates (1 percent, 2 percent and 5 percent) and combinations of error types were tested. Ultimately, testing continued using only the rates thought to most closely approximate those rates observed in practice. Table 1 below reports for each variable the error 8

13 rate adopted. These error rates are approximately equivalent to actual observed rates in New Zealand non-substitute census data. Table 1 Error Rates and Error Types Used to Approximate Census Data Errors Age Sex Marital status Commonlaw status Relationship to reference person Item non-response 1% 1% 3% 3% 3% Interchange errors 1% 1% 1% 1% 1% Outliers <0.1% N/A N/A N/A N/A Each simulation, editing and imputation was carried out 10 times so that the quoted quality indices are actually averages of 10 runs. We also considered putting raw 2001 Census data through CANCEIS so that the CANCEIS E & I results could then be compared with the actual census outcome. However there is then no way of knowing which is the correct result (census or CANCEIS). ESSE's accuracy indices ESSE evaluates the quality of an editing and imputation system in terms of its capability to recognise errors and adequately replace them with the true values: the higher the proportion of corrected errors of the total, and the fewer new errors introduced, the more accurate the process (Barcaroli et al). ESSE provides a range of accuracy indicators. We look here at the most important those which identify how many new errors are introduced, and those that show how well CANCEIS has identified (editing accuracy) and corrected (imputation accuracy) the changes made in the modified data. These are: Editing accuracy E mod : fraction of modified data that are correctly classified as errors. Imputation accuracy I mod : fraction of imputed modified data whose true value is correctly restored. Editing and imputation accuracy E & I mod : fraction of modified data whose true value is correctly restored. Changes to true data E & I tru : fraction of true data whose true value is correctly restored. Each index ranges from 0 (no accuracy) to 1 (maximum accuracy). If E & I tru is less than 100 percent then the editing and imputation system has introduced errors into the data. For all the quality indicators and mathematical details of their calculation see Appendix 4. Note that: (1) The imputation process only imputes values previously classified by the editing process as erroneous. (2) In the case of qualitative variables, the imputation process is successful if the new assigned value equals the original one. 9

14 (3) In the case of the age variable, the imputation process is successful if the new assigned value lies within 10 percent of the original value. 5 Results Full tables of results are in Appendix 5. The following is a summary of the main results. A limitation of the prototype is that a maximum of 1,000 households can be put through at one time. We would expect better results in a production version where more donors would be available for unusual households. 5.1 Introduced errors High values of the proportion of true data whose true value is correctly restored (E & I tru ) indicate that the editing and imputation system does not introduce new errors in data. This feature depends on the restrictions imposed by the edits and on the characteristics of the error identification and imputation systems. If the error identification system fails, then the system classifies as erroneous some true values. In our testing, values of almost 100 percent in all cases for both two- and four-person households indicate that the CANCEIS system introduces very few new errors. If the edits are too restrictive, then the editing and imputation system could classify as erroneous some plausible but uncommon true values. There should only be errors introduced in our test data for households where there are already errors due to the ESSE modification, and the system has made wrong choices in identifying and fixing the real error. Because our initial true test data had to pass CANCEIS edits in order to be defined as true, households which are unmodified in the simulations will not contain errors introduced because edits are too stringent. This is a limitation of this analysis, and something that needs to be investigated further. 5.2 Editing and imputation accuracy for modified data High values of the proportion of modified data whose true value is correctly restored (E & I mod ) indicate that the editing and imputation system is able to detect a large proportion of the errors in the data and accurately restore the true value. The ability of the system to do this again depends on the set of defined edits and the characteristics of the error identification and imputation systems. We have tested CANCEIS over a range of error rates, from 2 percent (1 percent non-response plus 1 percent interchange error) to 10 percent (5 percent non-response plus 5 percent interchange error), and for two- and four-person households. Since the errors are introduced independently for each variable the maximum overall error rate is the sum of the rates, ie 10 percent to 50 percent. See table 2 below. Overall, the performance of the system in detecting and correcting errors is good for sex, legal marital status, common-law status and relationship to reference person. The accuracy for age is much lower. The accuracy generally decreases as the error rate increases, but only by small amounts at the error rates tested. Four-person households produce somewhat better results for legal and social marital status than two person, and worse for sex, age and relationship. 10

15 Table 2 E & I mod Accuracy Indicator Percentage values (1) Variables Error rate Two-person households Age 2% 4% 10% Common-law status 2% 4% 10% Legal marital status 2% 4% 10% Relationship to reference person 2% 4% 10% Sex 2% 4% 10% 32% 32% 26% 94% 92% 91% 64% 65% 62% 88% 86% 83% 88% 88% 85% Four-person households 19% 20% 17% 100% 100% 100% 70% 74% 67% 77% 78% 76% 70% 72% 69% (1) Note that the figures are all averages of 10 simulations. Figure 1 Editing and Imputation Accuracy for Modified Data (E & I mod ) Four-person households Accuracy index Percent non response and interchange error Age Common-law Marital status Relationship Sex The predictive accuracy of the imputation process is extremely good for common-law status, and quite poor for age. This is because of the different nature of these variables. Common-law status has two values yes or no and age has 121 possible valid values. A modified value of common-law status is always wrong and the edits are very good at picking this up. However, a modified age value may still be close to the real value and within our success criteria, so that we 11

16 would not expect this to be recognised by the edits. When an error is detected, in common-law status the defined edits force the true value to be imputed in most cases, while in age the edits can be passed by imputing values that may be very different from the true value. As well, the low success rate for the age imputation is partly due to how a correct result has been defined. We have used a success criteria of within 10 percent of the true value. So for a true age of 50 years, an imputed value of between 45 and 55 years will be counted as successful. But, for a 10 year old, a successful imputed value must be between 9 and 11 years. A wider success range could be allowed with a consequent improvement in accuracy indicators perhaps within 20 percent of the true value is still a good result for an imputed age. (Note: ESSE will not allow an absolute value criteria, eg +/- five years.) We have also tested CANCEIS with error rates close to those found in 2001 Census (as per table 1) and attempted to make some comparisons with SNZ imputations. Note that the errors include both changes to the true value, but still valid values, and missing data. Table 3 shows E mod, which answers the question 'How many errors are detected by the edits?' and I mod, which answers 'When an error is found, how often is the correct value returned?' Both these values contribute to E & I mod, the overall accuracy for modified data. We think that, apart from age, these are very good results for an E & I system between 72 percent and 100 percent of errors have been successfully replaced by their true values. For age, it can be seen that for four-person households the detection of errors is quite good at 69 percent, but that the imputation of correct values is poor, with only 30 percent successfully imputed. For two-person households, the detection rate is slightly worse at 54 percent, but the imputation rate is better at 52 percent. For the other variables, the E mod, and I mod, values are very similar for both household sizes, and both very high. Table 3 Accuracy Indicators Percentage values for the Error Rates in Table 1 Household size Variable Total Error rate E mod I mod E & I mod Two-person households Four-person households Age 2% 54% 52% 29% Common-law status 4% 97% 100% 97% Legal marital status 4% 86% 86% 74% Relationship to 4% 95% 96% 91% reference person Sex 2% 100% 88% 88% Age 2% 69% 30% 21% Common-law status 4% 100% 100% 100% Legal marital status 4% 88% 90% 79% Relationship to 4% 92% 94% 87% reference person Sex 2% 100% 72% 72% 12

17 For the sake of comparison, we have made a rough estimate of the quality of the SNZ sex and age imputations (ie I mod ) for the non-substitute population. These are for the whole population, not by household size. Using the same success criteria as above (ie within 10 percent of the true value for age) we assume that: 100 percent of the deterministic imputations are successful 50 percent of stochastic imputations for sex are successful 50 percent of age imputations using the family are successful 20 percent of the age imputations using general distributions are successful. Table 4 Census 2001 Imputation accuracy, I mod Non-substitute Population Over All Household Sizes SNZ Census 2001 (all methods) SNZ Census 2001 (without DFage/name) CANCEIS (from Table 1) Age 80% 30% 52%, 30% Sex 90% 55% 88%, 72% Bearing in mind that these assumptions are only approximate and untested, we can say that the SNZ imputation success rate for age is comparable to CANCEIS when age on the dwelling form (Dfage) is not available. The use of DFage provides a much more accurate imputed age than is possible with other imputation methods. For sex, CANCEIS is almost as good as SNZ even with operator intervention (an operator can use the name or relationship), and much better than when the imputation was stochastic. A comparison between SNZ and CANCEIS for E mod is not possible because we do not know the success rate of error detection that SNZ achieved in This would in any case be difficult to compare with CANCEIS because SNZ's edits attempt to find a different set of errors. 6 Computing Environment CANCEIS is written in C+. Statistics Canada has supplied us with a prototype in the form of executable files; that is, we have no access to the actual code. We found no bugs or problems in running the program. We have found CANCEIS very easy to use. It was necessary to reformat census data changing person order and recoding New Zealand variables to Canadian codes. The reformatting to Canadian codes is a temporary testing feature; if the software is to be tested more extensively codefiles of SNZ codes could be created. In addition to the census data, input parameter files (including the default DLTs provided with the prototype) are used. The parameter files control the way the application runs. They are well documented, with explanations of each line and of how changes can be made. The edits and default parameters are easily changed. The reports produced by CANCEIS are comprehensive. Basic reports, for example summary statistics such as the number of records passed and failed are available, as well as, among others, details of the imputed households, the donors used, the distances of the donors from the imputed households, and timings for all aspects of the system. The user can tailor the reports produced to the level of detail required. 13

18 No proper timings have been carried out, except to note that CANCEIS always ran very quickly. Statistics Canada quote processing times with actual census data of 0.65, 1.39 and 3.54 seconds per 100 records processed (both passed and failed) for four-, six- and eight-person households, respectively. This processing was carried out on a Pentium II, 350 megahertz PC. A copy of the CANCEIS prototype was obtained from Statistics Canada and the research team was granted a licence to evaluate the software product at no cost. Use of the software for New Zealand's 2006 Census would require a single licence fee, payable prior to installation of the software, and an annual technical services fee thereafter. The technical services fee covers the provision of upgraded versions of the software that improve functionality, correct known bugs or are necessary because of changes in the computing platform used by the software. The fee also covers technical assistance and guidance provided by Statistics Canada. However, further testing of the software as part of a development investigation could be carried out using the prototype already provided without purchasing the software. 7 Further Work 7.1 More test data It would be better to confirm the quality indicators with more test data and different household sizes. This process would include looking in detail at the kind of households which are still in error after going through CANCEIS, and investigating where CANCEIS introduces errors because the edits are too restrictive. Raw recognition data could be used to test CANCEIS against the full range of errors and households that actually occur. 7.2 Edits The CANCEIS edits have been developed for Canadian households and families and Canadian expectations of what are reasonable responses within those households and families. They appear to be an extension of the kinds of edits already used by SNZ. If we assume that Canadian and New Zealand households and family structures are fairly similar, then there will probably not be too many changes required to the edits. But before SNZ could adopt CANCEIS it is imperative that we thoroughly understand what the edits are doing, and adapt them, as required, for New Zealand conditions. This is likely to entail a considerable amount of work involving census and subject matter experts. It is particularly important that we understand how edits affect small groups of unusual households that might be inflated by the action of particular edits. As noted above, it is easy to change the DLTs, but identifying what needs to be changed would be a complex task. 7.3 Family coding SNZ family coding uses the variables (age, sex, etc) that were used in testing CANCEIS (plus a few more variables) and consists of two parts: an automatic module that codes simple households, and an operator component for very complex households or households where inconsistent responses cause family coding edit failures. We have done no specific work on the impact of CANCEIS on family coding but it seems reasonable to expect that producing consistent responses to the relationship variable, etc, should remove most (if not all) family coding edit failures. 14

19 In addition, there is a family formation program which identifies potential couples and parents and children before editing and imputation, and improves the performance of CANCEIS in detecting and resolving errors. Our testing has been carried out without these 'potential families' identified, but if we were to use CANCEIS it would make sense to also borrow the Canadian approach to finding families. If we did this we should have all the information needed to produce family codes automatically. Removing the operator family coding would be a significant change for census processing. 7.4 Other issues Other issues that still need to be investigated include: How unusual or complex dwellings are treated (eg non-private dwellings, substitute households, visitors and absentees, large or complex households, other unusual households). Whether DFage would still be needed for age imputation if CANCEIS were used. Other questionnaire requirements if CANCEIS were used. The costs (to buy, implement and maintain). Timings. How CANCEIS would fit with other aspects of the processing system. The IT support that would be needed. Extensions to the use of CANCEIS (CANCEIS could be used for the E & I of any set of census variables, for example the inter-related work status variables, or in sample surveys). 15

20 Appendix 1 SNZ Edits that would be Covered by CANCEIS SNZ edits (2001) which would be covered by CANCEIS A. Within-person edits The within-person edits check: for blanks and invalid responses including multiple responses inconsistencies between age and other variables inconsistent responses for relationship. Blanks and Invalid Responses Including Multiple Response Variable Edit number Edit sex 0 multiple response or no response legal marital status relationship 48, 49, 140, 141 Sent to Operator, or Automatic resolution Operator Action highlight true response if possible 47 multiple response Operator ensure response reflects respondent's intention (1) multiple response or no response If true and valid response not found then: impute set to unidentifiable Operator resolve if possible set to unidentifiable date of birth 13 invalid date Operator correct recognition errors DF age 7 age>100 Operator correct recognition errors absentee age living arrangement s, (for social marital status) 14 age>100 Operator correct recognition errors impute 73 live alone and with others Operator ensure response reflects respondent's intention; or amend if necessary (1) leave (1) This is a check to ensure obvious negative responses (paper marks, cross-outs, etc) are not included operators do not make any other decisions. leave leave Inconsistencies between age and other variables Age derived from date of birth and age on the dwelling form (DFage) differ by more than one year. Edit 4. An operator checks for recognition errors and incorrect person numbering. If the difference is more than 10 years, the operator determines which age is correct, and changes year of birth if that is incorrect. Where only one age is given. Edits 2 and 89. The edit fails if more than one of the following conditions is true: age < 16 and ever legally married age < 15 and live with partner age < 12 and live with child age < 4 and not 'not born 5 years ago' age < years in NZ 16

21 age < years at usual residence age < 15 and adult income The operator corrects recognition errors in any of the fields, otherwise leaves responses as given. A person at home on census night cannot have a relation to reference person of visitor. Edit 128. There are other edits using age which act as a check on other variables, but where age is left unchanged. B. Between-person edits The between-person edits check for: inconsistent relationships inconsistencies between age and family coding inconsistencies between living with fields (eg live with mother and father ) and family coding. Inconsistent relationships These are definite edits, ie they must be corrected by the operator. Only one reference person is possible. Edit 168. There can be at most one spouse of the reference person. Edit 169. Inconsistencies between age and family coding The family coding edits occur after age has been fixed. They also occur before age imputation. The operator may change the family codes, or leave the family codes if they consider that age is wrong. Age < 16 and family code for parent or partner. Edits 142 and 143. Parent's age <= child's age. Edit 144. Grandparent's age <= grandchild's age. Edit 145. Person < 15 must be a child in a family nucleus except when they have a child or partner of their own or everyone in the household is under 15 years. Edit 146. A child forced into a family nucleus must be less than 15. Edit 176. Inconsistencies between living with fields and family coding The operator may change the family codes, or change the 'living with' fields. A person coded as a couple must be living with a partner. Edit 148. A person living with their father or mother must be in a family nucleus. Edit 139. There are other edits which check for inconsistencies among the family codes. Appendix 2 Canadian Consistency Rules Between person and within person control rules In the table below 'person1' means reference person. 17

22 Rules Description Strata (1) Within person consistency rules Between person consistency rules Edits to check for age conflicts (under 15 and either ever married, adult or 1-8 common-law) Edit for 1 person household (hhld) (person1 is common-law) 1 Edits to check for inconsistencies related to common-law and marital status 2-8 variables of person2 Edits to check for inconsistencies related to common-law and marital status 1-8 variables of all other persons in the household Edits to check for age conflicts in 9+ strata 9 Additional edits for age verification for the 9+ strata. For the 1-8 person 9 hhlds these checks are covered in the between person edits Edits to check for inconsistencies related to common-law and marital status 9 Edits for person1 and person1's partner that checks for inconsistencies 2-8 between relationship and marital status, as well as relationship and common-law status Edits for age checks between person1, person2 and their children 2-8 Edits for checking age between person1's step-son/daughter and person1's 3-8 spouse Edits for checking age between person1 and a lone father/mother 2-8 Edits for checking age between sibling and a lone father/mother 3-8 Edits for checking age between person1's spouse and a lone father/mother 3-8 Edits for checking age between father/mothers and person1 and 3-8 father/mother-in-laws and spouse of person1 Edits for checking age between father/mothers and brother/sister of person1 4-8 Additional Edits for age checks 2-8 Edits for checking the number of parents, parents-in-law and grand-parents 4-8 Edits to check for proper sex within couples (their words not mine!!!) 3-8 Edits checking if only one of the partners is in a common-law relationship 3-8 Edits to check whether one of the partners is not now married 3-8 Edits to check whether the first partner is neither now married nor commonlaw 3-8 Edits to check whether the first partner is neither now married nor commonlaw 3-8 Edits to check for the presence of a common-law spouse 3-8 Edits to check for the presence of a common-law spouse of other relative 3-8 Secondary rules Secondary edits to restrict widowed and age < Secondary edits to restrict widowed and age < 24 for 9+ strata 9 Secondary edits for controlling ages between mothers and their children 2-8 Secondary edits for controlling ages between siblings 2-8 Within person validity rules Edits that will determine which households fail due to position 1 not being 1-8 person1 or not being an adult (age > 15) Edits that check for conflicts with the person1 concept 2-8 Edits that check for conflicts with the person1 concept in 9+ strata 9 (1) Size of household to which the rule is applied. 18

23 Appendix 3 Summary of Quality Management Strategy for the 2001 Census 6.3 Processing A high level editing management plan was written that indicated the direction that editing would take in the 2001 Census. The editing philosophy for the 2001 Census of Population and Dwellings was driven by: a desire to minimise the number of complex edits, and the Quality Management Strategy, which focused on keeping respondents' intentions and providing output data that is 'fit for use' rather than 'sanitised'. The aim was to minimise interfield edits during processing and instead run macro checks for inconsistencies between fields at the macro evaluation stage. It was anticipated that this approach would give us a better understanding of the quality of the data and more control over changes. It is important to reinforce that this philosophy means that some inconsistency will remain in the data. 6.4 Output The major change for 2001 macro-evaluation is the way that we dealt with 'problems' or improbabilities which reflect respondents intended responses. The 2001 Quality Management Strategy says that we will not change respondents answers to fit in with the SNZ view of the world. The aim is to tell users the real content and quality of the data, and acknowledge the problems rather than trying to disguise them. It is worth noting that this is an approach which has been adopted by both Statistics Canada and the Australian Bureau of Statistics. As a result, the final output database may contain more seemingly inconsistent data than it has in previous censuses. This change in philosophy does not mean that data quality for 2001 will be worse than 1996, as our 'fixes' in 1996 did not necessarily make the data of better quality. In many cases our actions simply disguised the error for some data. Appendix 4 Measurement of the Accuracy of the Editing and Imputation Processes (ESSE User Manual) Statistically, we can measure the accuracy of the editing process as follows: Original data Editing classification Erroneous True Modified a b Not modified c d where a = number of modified data classified by the editing process as erroneous b = number of modified data classified by the editing process as true c = number of true (not modified) data classified by the editing process as erroneous d = number of true (not modified) data classified by the editing process as true. 19

Methodology Statement: 2011 Australian Census Demographic Variables

Methodology Statement: 2011 Australian Census Demographic Variables Methodology Statement: 2011 Australian Census Demographic Variables Author: MapData Services Pty Ltd Version: 1.0 Last modified: 2/12/2014 Contents Introduction 3 Statistical Geography 3 Included Data

More information

Final Count for the 2011 Tokelau Census of Population and Dwellings

Final Count for the 2011 Tokelau Census of Population and Dwellings Final Count for the 2011 Tokelau Census of Population and Dwellings Crown copyright This work is licensed under the Creative Commons Attribution 3.0 New Zealand licence. You are free to copy, distribute,

More information

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND

Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND Supplementary questionnaire on the 2011 Population and Housing Census SWITZERLAND Supplementary questionnaire on the 2011 Population and Housing Census Fields marked with are mandatory. INTRODUCTION As

More information

Supplementary questionnaire on the 2011 Population and Housing Census FRANCE

Supplementary questionnaire on the 2011 Population and Housing Census FRANCE Supplementary questionnaire on the 2011 Population and Housing Census FRANCE Supplementary questionnaire on the 2011 Population and Housing Census Fields marked with are mandatory. INTRODUCTION As agreed

More information

DATA VALIDATION-I Evaluation of editing and imputation

DATA VALIDATION-I Evaluation of editing and imputation DATA VALIDATION-I Evaluation of editing and imputation Census processing overview Steps of data processing depend on the technology used in general, the process covers the following steps: Preparati on

More information

Record Linkage between the 2006 Census of the Population and the Canadian Mortality Database

Record Linkage between the 2006 Census of the Population and the Canadian Mortality Database Proceedings of Statistics Canada Symposium 2016 Growth in Statistical Information: Challenges and Benefits Record Linkage between the 2006 Census of the Population and the Canadian Mortality Database Mohan

More information

National Population Estimates: March 2009 quarter

National Population Estimates: March 2009 quarter Image description. Hot Off The Press. End of image description. Embargoed until 10:45am 15 May 2009 National Population Estimates: March 2009 quarter Highlights The estimated resident population of New

More information

National Population Estimates: June 2011 quarter

National Population Estimates: June 2011 quarter National Population Estimates: June 2011 quarter Embargoed until 10:45am 12 August 2011 Highlights The estimated resident population of New Zealand was 4.41 million at 30 June 2011. Population growth was

More information

Using Administrative Records for Imputation in the Decennial Census 1

Using Administrative Records for Imputation in the Decennial Census 1 Using Administrative Records for Imputation in the Decennial Census 1 James Farber, Deborah Wagner, and Dean Resnick U.S. Census Bureau James Farber, U.S. Census Bureau, Washington, DC 20233-9200 Keywords:

More information

0-4 years: 8% 7% 5-14 years: 13% 12% years: 6% 6% years: 65% 66% 65+ years: 8% 10%

0-4 years: 8% 7% 5-14 years: 13% 12% years: 6% 6% years: 65% 66% 65+ years: 8% 10% The City of Community Profiles Community Profile: The City of Community Profiles are composed of two parts. This document, Part A Demographics, contains demographic information from the 2014 Civic Census

More information

Neighbourhood Profiles Census

Neighbourhood Profiles Census Neighbourhood Profiles - 2011 Census 35 Queen s This neighbourhood profile is based on custom area tabulations generated by Statistics Canada and contains data from the 2011 Census only. The 2011 National

More information

Adjusting for linkage errors to analyse coverage of the Integrated Data Infrastructure (IDI) and the administrative population (IDI-ERP)

Adjusting for linkage errors to analyse coverage of the Integrated Data Infrastructure (IDI) and the administrative population (IDI-ERP) Adjusting for linkage errors to analyse coverage of the Integrated Data Infrastructure (IDI) and the administrative population (IDI-ERP) Hochang Choi, Statistical Analyst, Stats NZ Paper prepared for the

More information

Population and dwellings Number of people counted Total population

Population and dwellings Number of people counted Total population Henderson-Massey Local Board Area Population and dwellings Number of people counted Total population 107,685 people usually live in Henderson-Massey Local Board Area. This is an increase of 8,895 people,

More information

Final population counts: 2016 Tokelau Census

Final population counts: 2016 Tokelau Census Final s: 2016 Tokelau Census Final s: 2016 Tokelau Census This work is licensed under the Creative Commons Attribution 4.0 International Licence. You are free to copy, distribute, and adapt the work, as

More information

Maintaining knowledge of the New Zealand Census *

Maintaining knowledge of the New Zealand Census * 1 of 8 21/08/2007 2:21 PM Symposium 2001/25 20 July 2001 Symposium on Global Review of 2000 Round of Population and Housing Censuses: Mid-Decade Assessment and Future Prospects Statistics Division Department

More information

Neighbourhood Profiles Census and National Household Survey

Neighbourhood Profiles Census and National Household Survey Neighbourhood Profiles - 2011 Census and National Household Survey 1 Sharpton/Glenvale This neighbourhood profile is based on custom area tabulations generated by Statistics Canada and contains data from

More information

Neighbourhood Profiles Census and National Household Survey

Neighbourhood Profiles Census and National Household Survey Neighbourhood Profiles - 2011 Census and National Household Survey 8 Sutton Mills This neighbourhood profile is based on custom area tabulations generated by Statistics Canada and contains data from the

More information

Name Position Telephone First contact. [redacted under

Name Position Telephone First contact. [redacted under Introductory briefing to the Minister of Statistics: 2018 Census Date: 31 October 2017 Priority: Medium Security level: In confidence File number: MM1728 Contact details Name Position Telephone First contact

More information

Population and dwellings Number of people counted Total population

Population and dwellings Number of people counted Total population Whakatane District Population and dwellings Number of people counted Total population 32,691 people usually live in Whakatane District. This is a decrease of 606 people, or 1.8 percent, since the 2006

More information

2011 UK Census Overview of E&I Process

2011 UK Census Overview of E&I Process 2011 UK Census Overview of E&I Process UNECE Work Session on Statistical Data Editing WP.3, Ljubijana 9 May 2011 Heather Wagstaff Methodology Directorate Office for National Statistics, U.K. Overview Background

More information

The main focus of the survey is to measure income, unemployment, and poverty.

The main focus of the survey is to measure income, unemployment, and poverty. HUNGARY 1991 - Documentation Table of Contents A. GENERAL INFORMATION B. POPULATION AND SAMPLE SIZE, SAMPLING METHODS C. MEASURES OF DATA QUALITY D. DATA COLLECTION AND ACQUISITION E. WEIGHTING PROCEDURES

More information

MATRIX SAMPLING DESIGNS FOR THE YEAR2000 CENSUS. Alfredo Navarro and Richard A. Griffin l Alfredo Navarro, Bureau of the Census, Washington DC 20233

MATRIX SAMPLING DESIGNS FOR THE YEAR2000 CENSUS. Alfredo Navarro and Richard A. Griffin l Alfredo Navarro, Bureau of the Census, Washington DC 20233 MATRIX SAMPLING DESIGNS FOR THE YEAR2000 CENSUS Alfredo Navarro and Richard A. Griffin l Alfredo Navarro, Bureau of the Census, Washington DC 20233 I. Introduction and Background Over the past fifty years,

More information

Article. The Internet: A New Collection Method for the Census. by Anne-Marie Côté, Danielle Laroche

Article. The Internet: A New Collection Method for the Census. by Anne-Marie Côté, Danielle Laroche Component of Statistics Canada Catalogue no. 11-522-X Statistics Canada s International Symposium Series: Proceedings Article Symposium 2008: Data Collection: Challenges, Achievements and New Directions

More information

1 NOTE: This paper reports the results of research and analysis

1 NOTE: This paper reports the results of research and analysis Race and Hispanic Origin Data: A Comparison of Results From the Census 2000 Supplementary Survey and Census 2000 Claudette E. Bennett and Deborah H. Griffin, U. S. Census Bureau Claudette E. Bennett, U.S.

More information

Ensuring the accuracy of Myanmar census data step by step

Ensuring the accuracy of Myanmar census data step by step : Ensuring the accuracy of Myanmar census data step by step 1. Making sure all households were counted 2. Verifying the data collected 3. Securely delivering questionnaires to the Census Office 4. Safely

More information

Planning for an increased use of administrative data in censuses 2021 and beyond, with particular focus on the production of migration statistics

Planning for an increased use of administrative data in censuses 2021 and beyond, with particular focus on the production of migration statistics Planning for an increased use of administrative data in censuses 2021 and beyond, with particular focus on the production of migration statistics Dominik Rozkrut President, Central Statistical Office of

More information

Urban and rural migration

Urban and rural migration Image description. Hot Off The Press. End of image description. Internal Migration Urban and rural migration Population change Population change has been higher for main urban s, and for rural and other

More information

2016 Census Profile on the Town of Richmond Hill

2016 Census Profile on the Town of Richmond Hill 2016 Census Profile on the Town of Richmond Hill Release #3: Families, households and marital status, and language Every 5 years, Statistics Canada (on behalf of the Government of Canada) undertakes a

More information

2011 UK CENSUS: AN OVERVIEW OF THE EDIT AND IMPUTATION PROCESS

2011 UK CENSUS: AN OVERVIEW OF THE EDIT AND IMPUTATION PROCESS WP3. ENGLISH ONLY UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Ljubljana, Slovenia, 9-11 May 2011) Topic (i): Editing of

More information

Quality assessment in a register-based census administrative versus statistical concepts in the case of households

Quality assessment in a register-based census administrative versus statistical concepts in the case of households Quality assessment in a register-based census administrative versus statistical concepts in the case of households Danilo Dolenc Statistical Office of the Republic of Slovenia Vožarski pot 12 1000 Ljubljana,

More information

Lessons learned from a mixed-mode census for the future of social statistics

Lessons learned from a mixed-mode census for the future of social statistics Lessons learned from a mixed-mode census for the future of social statistics Dr. Sabine BECHTOLD Head of Department Population, Finance and Taxes, Federal Statistical Office Germany Abstract. This paper

More information

Evaluation and analysis of socioeconomic data collected from censuses. United Nations Statistics Division

Evaluation and analysis of socioeconomic data collected from censuses. United Nations Statistics Division Evaluation and analysis of socioeconomic data collected from censuses United Nations Statistics Division Socioeconomic characteristics Household and family composition Educational characteristics Literacy

More information

2011 National Household Survey (NHS): design and quality

2011 National Household Survey (NHS): design and quality 2011 National Household Survey (NHS): design and quality Margaret Michalowski 2014 National Conference Canadian Research Data Center Network (CRDCN) Winnipeg, Manitoba, October 29-31, 2014 Outline of the

More information

Using Administrative Records and the American Community Survey to Study the Characteristics of Undercounted Young Children in the 2010 Census

Using Administrative Records and the American Community Survey to Study the Characteristics of Undercounted Young Children in the 2010 Census Using Administrative Records and the American Community Survey to Study the Characteristics of Undercounted Young Children in the 2010 Census Leticia Fernandez, Rachel Shattuck and James Noon Center for

More information

Supplementary questionnaire on the 2011 Population and Housing Census SLOVAKIA

Supplementary questionnaire on the 2011 Population and Housing Census SLOVAKIA Supplementary questionnaire on the 2011 Population and Housing Census SLOVAKIA Supplementary questionnaire on the 2011 Population and Housing Census Fields marked with are mandatory. INTRODUCTION As agreed

More information

Italian Americans by the Numbers: Definitions, Methods & Raw Data

Italian Americans by the Numbers: Definitions, Methods & Raw Data Tom Verso (January 07, 2010) The US Census Bureau collects scientific survey data on Italian Americans and other ethnic groups. This article is the eighth in the i-italy series Italian Americans by the

More information

Key Words: age-order, last birthday, full roster, full enumeration, rostering, online survey, within-household selection. 1.

Key Words: age-order, last birthday, full roster, full enumeration, rostering, online survey, within-household selection. 1. Comparing Alternative Methods for the Random Selection of a Respondent within a Household for Online Surveys Geneviève Vézina and Pierre Caron Statistics Canada, 100 Tunney s Pasture Driveway, Ottawa,

More information

Preservation Costs Survey. Summary of Findings

Preservation Costs Survey. Summary of Findings Preservation Costs Survey Summary of Findings prepared for Civil Justice Reform Group William H.J. Hubbard, J.D., Ph.D. Assistant Professor of Law University of Chicago Law School February 18, 2014 Preservation

More information

An assessment of household deaths collected during Census 2011 in South Africa. Christine Khoza, PhD Statistics South Africa

An assessment of household deaths collected during Census 2011 in South Africa. Christine Khoza, PhD Statistics South Africa An assessment of household deaths collected during Census 2011 in South Africa By Christine Khoza, PhD Statistics South Africa 1 Table of contents 1. Introduction... 2 2. Preliminary evaluation of samples

More information

Sampling Subpopulations

Sampling Subpopulations 1 Sampling Subpopulations Robert Clark 1 Robert Templeton 2 1 University of Wollongong 2 formerly New Zealand Ministry of Health Frontiers in Social Statistics Methodology 8 February 2017 2 Outline Features

More information

ECE/ system of. Summary /CES/2012/55. Paris, 6-8 June successfully. an integrated data collection. GE.

ECE/ system of. Summary /CES/2012/55. Paris, 6-8 June successfully. an integrated data collection. GE. United Nations Economic and Social Council Distr.: General 15 May 2012 ECE/ /CES/2012/55 English only Economic Commission for Europe Conference of European Statisticians Sixtieth plenary session Paris,

More information

Working with NHS and Taxfiler data to measure income and poverty in Toronto neighbourhoods

Working with NHS and Taxfiler data to measure income and poverty in Toronto neighbourhoods Working with NHS and Taxfiler data to measure income and poverty in Toronto neighbourhoods Wayne Chu Planning Analyst Social Development, Finance & Administration, City of Toronto CCSD Community Data Canada

More information

Sierra Leone - Multiple Indicator Cluster Survey 2017

Sierra Leone - Multiple Indicator Cluster Survey 2017 Microdata Library Sierra Leone - Multiple Indicator Cluster Survey 2017 Statistics Sierra Leone, United Nations Children s Fund Report generated on: September 27, 2018 Visit our data catalog at: http://microdata.worldbank.org

More information

Measuring Multiple-Race Births in the United States

Measuring Multiple-Race Births in the United States Measuring Multiple-Race Births in the United States By Jennifer M. Ortman 1 Frederick W. Hollmann 2 Christine E. Guarneri 1 Presented at the Annual Meetings of the Population Association of America, San

More information

Some Indicators of Sample Representativeness and Attrition Bias for BHPS and Understanding Society

Some Indicators of Sample Representativeness and Attrition Bias for BHPS and Understanding Society Working Paper Series No. 2018-01 Some Indicators of Sample Representativeness and Attrition Bias for and Peter Lynn & Magda Borkowska Institute for Social and Economic Research, University of Essex Some

More information

2016 Census Bulletin: Families, Households and Marital Status

2016 Census Bulletin: Families, Households and Marital Status 2016 Census Bulletin: Families, Households and Marital Status Kingston, Ontario Census Metropolitan Area (CMA) The 2016 Census Day was May 10, 2016. On August 2, 2017, Statistics Canada released its fourth

More information

Economic and Social Council

Economic and Social Council UNITED NATIONS E Economic and Social Council Distr. GENERAL ECE/CES/GE.41/2009/18 19 August 2009 Original: ENGLISH ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Group of Experts on

More information

Using 2010 Census Coverage Measurement Results to Better Understand Possible Administrative Records Incorporation in the Decennial Census

Using 2010 Census Coverage Measurement Results to Better Understand Possible Administrative Records Incorporation in the Decennial Census Using Coverage Measurement Results to Better Understand Possible Administrative Records Incorporation in the Decennial Andrew Keller and Scott Konicki 1 U.S. Bureau, 4600 Silver Hill Rd., Washington, DC

More information

American Community Survey 5-Year Estimates

American Community Survey 5-Year Estimates DP02 SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES 2012-2016 American Community Survey 5-Year Estimates Supporting documentation on code lists, subject definitions, data accuracy, and statistical

More information

INTEGRATED COVERAGE MEASUREMENT SAMPLE DESIGN FOR CENSUS 2000 DRESS REHEARSAL

INTEGRATED COVERAGE MEASUREMENT SAMPLE DESIGN FOR CENSUS 2000 DRESS REHEARSAL INTEGRATED COVERAGE MEASUREMENT SAMPLE DESIGN FOR CENSUS 2000 DRESS REHEARSAL David McGrath, Robert Sands, U.S. Bureau of the Census David McGrath, Room 2121, Bldg 2, Bureau of the Census, Washington,

More information

American Community Survey 5-Year Estimates

American Community Survey 5-Year Estimates DP02 SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES 2011-2015 American Community Survey 5-Year Estimates Supporting documentation on code lists, subject definitions, data accuracy, and statistical

More information

Scenario 5: Family Structure

Scenario 5: Family Structure Scenario 5: Family Structure Because human infants require the long term care and nurturing of adults before they can fend for themselves in often hostile environments, the family in some identifiable

More information

Economic and Social Council

Economic and Social Council United Nations Economic and Social Council Distr.: General 21 March 2012 ECE/CES/2012/22 Original: English Economic Commission for Europe Conference of European Statisticians Sixtieth plenary session Paris,

More information

TED NAT! ONS. LIMITED ST/ECLA/Conf.43/ July 1972 ORIGINAL: ENGLISH. e n

TED NAT! ONS. LIMITED ST/ECLA/Conf.43/ July 1972 ORIGINAL: ENGLISH. e n BIBLIOTECA NACIONES UNIDAS MEXIGO TED NAT! ONS LIMITED ST/ECLA/Conf.43/1.4 11 July 1972 e n ORIGINAL: ENGLISH (»»«tiiitmiimmiimitmtiitmtmihhimtfimiiitiinihmihmiimhfiiim i infittititi m m ECONOMIC COMMISSION

More information

Economic and Social Council

Economic and Social Council UNITED NATIONS E Economic and Social Council Distr. GENERAL 5 May 2008 Original: ENGLISH ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Joint UNECE/Eurostat Meeting on Population and

More information

census 2016: count yourself in

census 2016: count yourself in On May 10, all Canadians will be asked to count themselves in. That includes YOU, so expect your family to get a letter from Statistics Canada. It will be all about the 2016 Census of Population. What

More information

Register-based National Accounts

Register-based National Accounts Register-based National Accounts Anders Wallgren, Britt Wallgren Statistics Sweden and Örebro University, e-mail: ba.statistik@telia.com Abstract Register-based censuses have been discussed for many years

More information

The Census questions. factsheet 9. A look at the questions asked in Northern Ireland and why we ask them

The Census questions. factsheet 9. A look at the questions asked in Northern Ireland and why we ask them factsheet 9 The Census questions A look at the questions asked in Northern Ireland and why we ask them The 2001 Census form contains a total of 42 questions in Northern Ireland, the majority of which only

More information

Canada Agricultural Census 2011 Explanatory notes

Canada Agricultural Census 2011 Explanatory notes Canada Agricultural Census 2011 Explanatory notes 1. Historical outline The British North America Act of 1867 included the requirement for a census to be taken every 10 years starting in 1871. However,

More information

1981 CENSUS COVERAGE OF THE NATIVE POPULATION IN MANITOBA AND SASKATCHEWAN

1981 CENSUS COVERAGE OF THE NATIVE POPULATION IN MANITOBA AND SASKATCHEWAN RESEARCH NOTES 1981 CENSUS COVERAGE OF THE NATIVE POPULATION IN MANITOBA AND SASKATCHEWAN JEREMY HULL, WMC Research Associates Ltd., 607-259 Portage Avenue, Winnipeg, Manitoba, Canada, R3B 2A9. There have

More information

Strategies for the 2010 Population Census of Japan

Strategies for the 2010 Population Census of Japan The 12th East Asian Statistical Conference (13-15 November) Topic: Population Census and Household Surveys Strategies for the 2010 Population Census of Japan Masato CHINO Director Population Census Division

More information

6 Sampling. 6.2 Target Population and Sample Frame. See ECB (2011, p. 7). Monetary Policy & the Economy Q3/12 addendum 61

6 Sampling. 6.2 Target Population and Sample Frame. See ECB (2011, p. 7). Monetary Policy & the Economy Q3/12 addendum 61 6 Sampling 6.1 Introduction The sampling design of the HFCS in Austria was specifically developed by the OeNB in collaboration with the Institut für empirische Sozialforschung GmbH IFES. Sampling means

More information

Panel Study of Income Dynamics: Mortality File Documentation. Release 1. Survey Research Center

Panel Study of Income Dynamics: Mortality File Documentation. Release 1. Survey Research Center Panel Study of Income Dynamics: 1968-2015 Mortality File Documentation Release 1 Survey Research Center Institute for Social Research The University of Michigan Ann Arbor, Michigan December, 2016 The 1968-2015

More information

Section 2: Preparing the Sample Overview

Section 2: Preparing the Sample Overview Overview Introduction This section covers the principles, methods, and tasks needed to prepare, design, and select the sample for your STEPS survey. Intended audience This section is primarily designed

More information

Comparative Study of Electoral Systems (CSES) Module 4: Design Report (Sample Design and Data Collection Report) September 10, 2012

Comparative Study of Electoral Systems (CSES) Module 4: Design Report (Sample Design and Data Collection Report) September 10, 2012 Comparative Study of Electoral Systems 1 Comparative Study of Electoral Systems (CSES) (Sample Design and Data Collection Report) September 10, 2012 Country: Poland Date of Election: 09.10.2011 Prepared

More information

The Internet Response Method: Impact on the Canadian Census of Population data

The Internet Response Method: Impact on the Canadian Census of Population data The Internet Response Method: Impact on the Canadian Census of Population data Laurent Roy and Danielle Laroche Statistics Canada, Ottawa, Ontario, K1A 0T6, Canada Abstract The option to complete the census

More information

Guyana - Multiple Indicator Cluster Survey 2014

Guyana - Multiple Indicator Cluster Survey 2014 Microdata Library Guyana - Multiple Indicator Cluster Survey 2014 United Nations Children s Fund, Guyana Bureau of Statistics, Guyana Ministry of Public Health Report generated on: December 1, 2016 Visit

More information

Voluntary Paternity Acknowledgment. Angie Saleeby Vital Records Operations Manager PHSIS

Voluntary Paternity Acknowledgment. Angie Saleeby Vital Records Operations Manager PHSIS Voluntary Paternity Acknowledgment Angie Saleeby Vital Records Operations Manager PHSIS Voluntary Acknowledgment of Paternity Program Hospitals must establish an in-hospital paternity acknowledgment program

More information

Submission to the Governance and Administration Committee on the Births, Deaths, Marriages, and Relationships Bill

Submission to the Governance and Administration Committee on the Births, Deaths, Marriages, and Relationships Bill National Office Level 4 Central House 26 Brandon Street PO Box 25-498 Wellington 6146 (04)473 76 23 office@ncwnz.org.nz www.ncwnz.org.nz 2 March 2018 S18.05 Introduction Submission to the Governance and

More information

2016 Census of Population: Age and sex release

2016 Census of Population: Age and sex release Catalogue no. 98-501-X2016002 ISBN 978-0-660-07150-3 Release and Concepts Overview 2016 Census of Population: Age and sex release Release date: March 15, 2017 Please note that this Release and Concepts

More information

Tabling of Stewart Clatworthy s Report: An Assessment of the Population Impacts of Select Hypothetical Amendments to Section 6 of the Indian Act

Tabling of Stewart Clatworthy s Report: An Assessment of the Population Impacts of Select Hypothetical Amendments to Section 6 of the Indian Act Tabling of Stewart Clatworthy s Report: An Assessment of the Population Impacts of Select Hypothetical Amendments to Section 6 of the Indian Act In summer 2017, Mr. Clatworthy was contracted by the Government

More information

Using registers E-enumeration and CAPI Electronic map. Census process. E-enumeration. Census moment and census period E-enumeration process

Using registers E-enumeration and CAPI Electronic map. Census process. E-enumeration. Census moment and census period E-enumeration process COMBINED CENSUS METHODOLOGY IN 2011 CENSUS IN ESTONIA Diana Beltadze Statistics Estonia Content Choice of methodology Using registers E-enumeration and CAPI Electronic map Census process. E-enumeration

More information

Turkmenistan - Multiple Indicator Cluster Survey

Turkmenistan - Multiple Indicator Cluster Survey Microdata Library Turkmenistan - Multiple Indicator Cluster Survey 2015-2016 United Nations Children s Fund, State Committee of Statistics of Turkmenistan Report generated on: February 22, 2017 Visit our

More information

; ECONOMIC AND SOCIAL COUNCIL

; ECONOMIC AND SOCIAL COUNCIL Distr.: GENERAL ECA/DISD/STAT/RPHC.WS/ 2/99/Doc 1.4 2 November 1999 UNITED NATIONS ; ECONOMIC AND SOCIAL COUNCIL Original: ENGLISH ECONOMIC AND SOCIAL COUNCIL Training workshop for national census personnel

More information

Austria Documentation

Austria Documentation Austria 1987 - Documentation Table of Contents A. GENERAL INFORMATION B. POPULATION AND SAMPLE SIZE, SAMPLING METHODS C. MEASURES OF DATA QUALITY D. DATA COLLECTION AND ACQUISITION E. WEIGHTING PROCEDURES

More information

Washington s Lottery: Daily Race Game Evaluation Study TOPLINE RESULTS. November 2009

Washington s Lottery: Daily Race Game Evaluation Study TOPLINE RESULTS. November 2009 Washington s Lottery: Daily Race Game Evaluation Study TOPLINE RESULTS November 2009 Study Objectives & Methodology Background & Objectives Washington s Lottery is in the process of evaluating two daily

More information

RESULTS OF THE CENSUS 2000 PRIMARY SELECTION ALGORITHM

RESULTS OF THE CENSUS 2000 PRIMARY SELECTION ALGORITHM RESULTS OF THE CENSUS 2000 PRIMARY SELECTION ALGORITHM Stephanie Baumgardner U.S. Census Bureau, 4700 Silver Hill Rd., 2409/2, Washington, District of Columbia, 20233 KEY WORDS: Primary Selection, Algorithm,

More information

Notes from a seminar on "Tackling Public Sector Fraud" presented jointly by the UK NAO and H M Treasury in London, England in February 1998.

Notes from a seminar on Tackling Public Sector Fraud presented jointly by the UK NAO and H M Treasury in London, England in February 1998. Tackling Public Sector Fraud Notes from a seminar on "Tackling Public Sector Fraud" presented jointly by the UK NAO and H M Treasury in London, England in February 1998. Glenis Bevan audit Manager, Audit

More information

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of Table of Contents Game Mechanics...2 Game Play...3 Game Strategy...4 Truth...4 Contrapositive... 5 Exhaustion...6 Burnout...8 Game Difficulty... 10 Experiment One... 12 Experiment Two...14 Experiment Three...16

More information

METHODOLOGY NOTE Population and Dwelling Stock Estimates, , and 2015-Based Population and Dwelling Stock Forecasts,

METHODOLOGY NOTE Population and Dwelling Stock Estimates, , and 2015-Based Population and Dwelling Stock Forecasts, METHODOLOGY NOTE Population and Dwelling Stock Estimates, 2011-2015, and 2015-Based Population and Dwelling Stock Forecasts, 2015-2036 JULY 2017 1 Cambridgeshire Research Group is the brand name for Cambridgeshire

More information

Indonesia - Demographic and Health Survey 2007

Indonesia - Demographic and Health Survey 2007 Microdata Library Indonesia - Demographic and Health Survey 2007 Central Bureau of Statistics (Badan Pusat Statistik (BPS)) Report generated on: June 16, 2017 Visit our data catalog at: http://microdata.worldbank.org

More information

REPORT ON THE EUROSTAT 2017 USER SATISFACTION SURVEY

REPORT ON THE EUROSTAT 2017 USER SATISFACTION SURVEY EUROPEAN COMMISSION EUROSTAT Directorate A: Cooperation in the European Statistical System; international cooperation; resources Unit A2: Strategy and Planning REPORT ON THE EUROSTAT 2017 USER SATISFACTION

More information

MAT 1272 STATISTICS LESSON STATISTICS AND TYPES OF STATISTICS

MAT 1272 STATISTICS LESSON STATISTICS AND TYPES OF STATISTICS MAT 1272 STATISTICS LESSON 1 1.1 STATISTICS AND TYPES OF STATISTICS WHAT IS STATISTICS? STATISTICS STATISTICS IS THE SCIENCE OF COLLECTING, ANALYZING, PRESENTING, AND INTERPRETING DATA, AS WELL AS OF MAKING

More information

February 24, [Click for Most Updated Paper] [Click for Most Updated Online Appendices]

February 24, [Click for Most Updated Paper] [Click for Most Updated Online Appendices] ONLINE APPENDICES for How Well Do Automated Linking Methods Perform in Historical Samples? Evidence from New Ground Truth Martha Bailey, 1,2 Connor Cole, 1 Morgan Henderson, 1 Catherine Massey 1 1 University

More information

Tonga - National Population and Housing Census 2011

Tonga - National Population and Housing Census 2011 Tonga - National Population and Housing Census 2011 Tonga Department of Statistics - Tonga Government Report generated on: July 14, 2016 Visit our data catalog at: http://pdl.spc.int/index.php 1 Overview

More information

The Demographic situation of the Traveller Community 1 in April 1996

The Demographic situation of the Traveller Community 1 in April 1996 Statistical Bulletin, December 1998 237 Demography The Demographic situation of the Traveller Community 1 in April 1996 Age Structure of the Traveller Community, 1996 Age group Travellers Total Population

More information

Overview of Civil Registration and Vital Statistics systems

Overview of Civil Registration and Vital Statistics systems Overview of Civil Registration and Vital Statistics systems Training Workshop on CRVS ESCAP, Bangkok 9-13 January 2016 Helge Brunborg Statistics Norway Helge.Brunborg@gmail.com Outline Civil Registration

More information

2007 Census of Agriculture Non-Response Methodology

2007 Census of Agriculture Non-Response Methodology 2007 Census of Agriculture Non-Response Methodology Will Cecere National Agricultural Statistics Service Research and Development Division, U.S. Department of Agriculture, 3251 Old Lee Highway, Fairfax,

More information

Report on the First Trial Census of the Register-Based Population and Housing Census (REGREL)

Report on the First Trial Census of the Register-Based Population and Housing Census (REGREL) Report on the First Trial Census of the Register-Based Population and Housing Census (REGREL) Moment of Census 31.12.2015 objekte n24 maksimaalne raadius 75 mm minimaalne raadius 2 mm 2017 Estonia s first

More information

SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT)

SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT) 1. Contact SURVEY ON USE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICT) 1.1. Contact organization: Kosovo Agency of Statistics KAS 1.2. Contact organization unit: Social Department Living Standard Sector

More information

Demographic and Social Statistics in the United Nations Demographic Yearbook*

Demographic and Social Statistics in the United Nations Demographic Yearbook* UNITED NATIONS SECRETARIAT Background document Department of Economic and Social Affairs September 2008 Statistics Division English only United Nations Expert Group Meeting on the Scope and Content of

More information

Sampling and Weighting

Sampling and Weighting Catalogue No. 92-395-XIE Sampling and Weighting 2001 Census Technical Report Statistics Canada Statistique Canada 2001 Census Technical Report Sampling and Weighting Page INTRODUCTION... 3 1. CENSUS DATA

More information

population and housing censuses in Viet Nam: experiences of 1999 census and main ideas for the next census Paper prepared for the 22 nd

population and housing censuses in Viet Nam: experiences of 1999 census and main ideas for the next census Paper prepared for the 22 nd population and housing censuses in Viet Nam: experiences of 1999 census and main ideas for the next census Paper prepared for the 22 nd Population Census Conference Seattle, Washington, USA, 7 9 March

More information

SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES American Community Survey 5-Year Estimates

SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES American Community Survey 5-Year Estimates DP02 SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES 2010-2014 American Community Survey 5-Year Estimates Supporting documentation on code lists, subject definitions, data accuracy, and statistical

More information

Laboratory 1: Uncertainty Analysis

Laboratory 1: Uncertainty Analysis University of Alabama Department of Physics and Astronomy PH101 / LeClair May 26, 2014 Laboratory 1: Uncertainty Analysis Hypothesis: A statistical analysis including both mean and standard deviation can

More information

MODERN CENSUS IN POLAND

MODERN CENSUS IN POLAND United Nations International Seminar on Population and Housing Censuses: Beyond the 2010 Round 27-29 November 2012 Seoul, Republic of Korea SESSION 7: Use of modern technologies for censuses MODERN CENSUS

More information

Additional file 1: Cleaning, Geocoding and Weighting

Additional file 1: Cleaning, Geocoding and Weighting Additional file 1: Cleaning, Geocoding and Weighting Contents 1 Introduction... 2 2 Address Accuracy and Cleaning... 2 2.1 Sources... 2 2.2 Address Linking... 3 2.3 Cleaning Summary... 3 3 Time Consistency

More information

How Statistics Canada Identifies Aboriginal Peoples

How Statistics Canada Identifies Aboriginal Peoples Catalogue no. 12-592-XIE How Statistics Canada Identifies Aboriginal Peoples Statistics Canada Statistique Canada How to obtain more information Specifi c inquiries about this product and related statistics

More information

PREPARATIONS FOR THE PILOT CENSUS. Supporting paper submitted by the Central Statistical Office of Poland

PREPARATIONS FOR THE PILOT CENSUS. Supporting paper submitted by the Central Statistical Office of Poland Distr. GENERAL CES/SEM.40/22 15 September 1998 ENGLISH ONLY STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT) CONFERENCE OF EUROPEAN STATISTICIANS

More information

2016 Census of Population and Housing: Submission Form for Content or Procedures, 2016

2016 Census of Population and Housing: Submission Form for Content or Procedures, 2016 2016 Census of Population and Housing: Submission Form for Content or Procedures, 2016 Before completing this form Pre-submission reading: Before making a submission, please read the following information

More information