The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges

Similar documents
2020 Census: Researching the Use of Administrative Records During Nonresponse Followup

Comparing the Quality of 2010 Census Proxy Responses with Administrative Records

Using Administrative Records to Improve Within Household Coverage in the 2008 Census Dress Rehearsal

Using 2010 Census Coverage Measurement Results to Better Understand Possible Administrative Records Incorporation in the Decennial Census

THE EVALUATION OF THE BE COUNTED PROGRAM IN THE CENSUS 2000 DRESS REHEARSAL

What Do We know About the Presence of Young Children in Administrative Records By William P. O Hare

Census Data for Transportation Planning

Overview of Demographic Data

2012 AMERICAN COMMUNITY SURVEY RESEARCH AND EVALUATION REPORT MEMORANDUM SERIES #ACS12-RER-03

The 2010 Census: Count Question Resolution Program

The Census Bureau s Master Address File (MAF) Census 2000 Address List Basics

Using Administrative Records for Imputation in the Decennial Census 1

Improving the Quality of Geocoded Data

Imputation research for the 2020 Census 1

Measuring Multiple-Race Births in the United States

Using the Census to Evaluate Administrative Records and Vice Versa

Using Administrative Records and the American Community Survey to Study the Characteristics of Undercounted Young Children in the 2010 Census

1 NOTE: This paper reports the results of research and analysis

Survey of Massachusetts Congressional District #4 Methodology Report

Chapter 1: Economic and Social Indicators Comparison of BRICS Countries Chapter 2: General Chapter 3: Population

Estimation Methodology and General Results for the Census 2000 A.C.E. Revision II Richard Griffin U.S. Census Bureau, Washington, DC 20233

The American Community Survey Motivation, History, and Design. Workshop on the American Community Survey Havana, Cuba November 16, 2010

The American Community Survey and the 2010 Census

The Road to 2020 Census

Introduction to the Wisconsin Census Research Data Center. Health Projects

The 2020 Census Geographic Partnership Opportunities

Recall Bias on Reporting a Move and Move Date

In-Office Address Canvassing for the 2020 Census: an Overview of Operations and Initial Findings

Demographic Projects

Removing Duplication from the 2002 Census of Agriculture

Secretary of Commerce

2020 Census. Bob Colosi Decennial Statistical Studies Division February, 2016

National Longitudinal Study of Adolescent Health. Public Use Contextual Database. Waves I and II. John O.G. Billy Audra T. Wenzlow William R.

CENSUS DATA COLLECTION IN MALTA

Prepared by. Deputy Census Manager Zambia

An Overview of the American Community Survey

Aboriginal Demographics. Planning, Research and Statistics Branch

INTEGRATED COVERAGE MEASUREMENT SAMPLE DESIGN FOR CENSUS 2000 DRESS REHEARSAL

Manuel de la Puente ~, U.S. Bureau of the Census, CSMR, WPB 1, Room 433 Washington, D.C

American Community Survey Review and Tips for American Fact Finder. Sarah Ehresman Kentucky State Data Center August 7, 2014

American Community Survey 5-Year Estimates

American Community Survey 5-Year Estimates

Overview of Census Bureau Geographic Areas and Concepts

Lecture 8 Geocoding. Dr. Zhang Spring, 2017

The 2020 Census Geographic Partnership Opportunities

Scenario 5: Family Structure

Conducting Research in the ACRDC

The 2020 Census Geographic Partnership Opportunities. Geography Division U.S. Census Bureau

US Census. Thomas Talbot February 5, 2013

GIS Data Sources. Thomas Talbot

SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES American Community Survey 5-Year Estimates

Geog 3340: Census Basics

Census Data for Grant Writing Workshop Cowlitz-Wahkiakum Council of Governments. Heidi Crawford Data Dissemination Specialist U.S.

National approaches to the dissemination of demographic statistics and their implication for the Demographic Yearbook

The Impact of the Great Migration on Mortality of African Americans: Evidence from the Deep South

Use of Registers in the Traditional Censuses and in the 2008 Integrated Census International Conference on Census methods Washington, DC 2014

PARISH CENSUS SOFTWARE STANDARD OPERATING PROCEDURES

An Introduction to ACS Statistical Methods and Lessons Learned

Local Update of Census Addresses Program Promotional Workshop

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Taming the Census TIGER:

1980 Census 1. 1, 2, 3, 4 indicate different levels of racial/ethnic detail in the tables, and provide different tables.

Modernizing Disclosure Avoidance: Report on the 2020 Disclosure Avoidance Subsystem as Implemented for the 2018 End-to-End Test (Continued)

Ensuring an Accurate Count of the Nation s Latinos in Census 2020

Health Record Linkage at Statistics Canada

2007 Census of Agriculture Non-Response Methodology

A Special Case of integrating administrative data and collection data in the context of the 2016 Canadian Census

2020 Census Local Update of Census Addresses. Operation (LUCA) Promotion

Documentation for April 1, 2010 Bridged-Race Population Estimates for Calculating Vital Rates

Experiences with the Use of Addressed Based Sampling in In-Person National Household Surveys

APPLICATION TO AMEND CERTIFICATE OF BIRTH

FUNERAL DIRECTOR INSTRUCTIONS

Workshop on the Improvement of Civil Registration and Vital Statistics in SADC Region Blantyre, Malawi 1 5 December 2008

Summary of Accuracy and Coverage Evaluation for the U.S. Census 2000

Accuracy of Data for Employment Status as Measured by the CPS- Census 2000 Match

American Community Survey Accuracy of the Data (2014)

Quick Reference Guide

Workshop on Census Data Evaluation for English Speaking African countries

ESP 171 Urban and Regional Planning. Demographic Report. Due Tuesday, 5/10 at noon

The progress in the use of registers and administrative records. Submitted by the Department of Statistics of the Republic of Lithuania

Redistricting San Francisco: An Overview of Criteria, Data & Processes

RESULTS OF THE CENSUS 2000 PRIMARY SELECTION ALGORITHM

What s New & Upcoming in 2017

How It Works and What s at Stake for Massachusetts. Wednesday, October 24, :30-10:30 a.m.

2011 National Household Survey (NHS): design and quality

2020 Census Update. Presentation to the Council of Professional Associations on Federal Statistics. December 8, 2017

Understanding the Census A Hands-On Training Workshop

2020 Census Geographic Partnership Programs. Update. Atlanta Regional Office Managing Census Operations in: AL, FL, GA, LA, MS, NC, SC

Working with United States Census Data. K. Mitchell, 7/23/2016 (no affiliation with U.S. Census Bureau)

The 2020 Census: A New Design for the 21 st Century Deirdre Dalpiaz Bishop Chief Decennial Census Management Division U.S.

Environmental Justice Tool Guide

Aiding Address-Based Matching Through Building Name Standardization

COUNTRY REPORT MONGOLIA

Building Rosters Sensibly: Who's on First (Avenue)?

Finding U.S. Census Data with American FactFinder Tutorial

Reengineering the 2020 Census

The U.S. Decennial Census A Brief History

; ECONOMIC AND SOCIAL COUNCIL

Burton Reist [signed] Acting Chief, Decennial Management Division

2020 Census Program Update

Transcription:

The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning, Research and Evaluation Division U.S. Census Bureau

How Administrative Records Are Created and Used Events and Objects (population) Observed Events and Objects ("sampling frame") Recorded Events and Objects (administrative record) Policy changes which change the definition of events and objects Ontologies and thresholds for observation Data collection Data entry errors and coding schemes Database Data management issues Presentation (query results and displays) Query structure and spurious structure 11/20/2000 U.S. CENSUS BUREAU 2

Ontologies and Data Quality Proper Representation Incomplete Representation State 1 State 1 State 1 State 1 State 2 State 2 State 2 State 2 State 3 State 3 State 3 State 4 Ambiguous Representation Meaningless States State 1 State 1 State 1 State 1 State 2 State 2 State 2 State 2 State 3 State 3 State 4 Source: Wand and Wang, 1996:90 11/20/2000 U.S. CENSUS BUREAU 3

Background and History Statistical Administrative Records System Six large Federal input files: IRS 1040, IRS 1099, Selective Service, Medicare, Indian Health Service, HUD-TRACS One lookup file: SSA/Census Numident AREX 2000 Attempt to use STARS data to simulate administrative records census 11/20/2000 U.S. CENSUS BUREAU 4

11/20/2000 U.S. CENSUS BUREAU 5 A Diagrammatic Depiction of Files Used to Create the Final StARS Database SSS Person Edited File 5.15 IRS 1040 Person Edited File 5.20 IRS 1099 Person Edited File 5.25 Medicare Person Edited File 5.30 HUD-TRACS Person Edited File 5.35 IHS Person Edited File 5.40 Medicaid Person Edited File (future possibility) 5.45 CHUMS Person Edited File (future possibility) 5.50 FAFSA Person Edited File (future possibility) 5.55 Composite Person Output 5.60 Address Output (aka 4.25) 5.70 5 Concatenate, sort, and unduplicate 7 9 Person Characteristic File (PCF) (aka 14.100) 5.05 Merge 5.85 Person Output 5.90 Return to 4 5.10 Original Address Pointers 5.65 Unduplicate & Reset Address Pointers 7 9 Updated Address Pointers 5.80 5.75

Characteristics of Files Included in the IRS Individual Master 1040 File: STARS System Tax year data; April, 2000 refers to tax year 1999 TY 99 file arrives October, 2000 Business entities, estates, other institutions included 120 million records/year Households below the filing threshold do not need to file Tax Filing Unit Housing Unit Czajka, 2000: 10-20% of addresses are PO Boxes, business addresses, tax preparers Limited microdata content: TY95+: SSN s of dependents requested, recorded Czjaika, 2000: 1987 study:.5% of primary filer, 1.6% of secondary filer, 3.4% of dependents SSN s in error Age, race, sex hispanic origin microdata not available 11/20/2000 U.S. CENSUS BUREAU 6

Characteristics of Files Included in the STARS System, cont. IRS Information Returns (1099) File: Tax year data; April, 2000 refers to tax year 1999 TY 99 file arrives October, 2000 Business entities, estates, other institutions included 775 million records/year Recipient address Housing Unit Czajka, 2000: 10-20% of addresses are PO Boxes, business addresses, tax preparers Limited microdata content: Age, race, sex hispanic origin microdata not available 11/20/2000 U.S. CENSUS BUREAU 7

Characteristics of Files Included in the STARS System, cont Selective Service File: About 13 million records Registration required in 1940, suspended in 1975, resumed in 1980 Presumably, males 18-25 are required to inform SSS when they move Females, non-immigrant aliens, hospitalized, incarcerated, and institutionalized males, and members of the armed forces are exempt Limited microdata content: Race, Hispanic origin microdata not available Address information may not be current 11/20/2000 U.S. CENSUS BUREAU 8

Characteristics of Files Included in the STARS System, cont. Medicare Enrollment Database (EDB): Current and historical Medicare enrollment Active and Inactive cases 35-40 million records at any one point in time; September 93: 77 million records (active + inactive) Proxy recipients listed on the file (e.g., John Doe s benefits c/o Jane Doe; John Doe s benefits c/o nursing home) A small portion of records at any point in time are probably deceased (Kim and Sater, 2000) Used in population estimates system for 65+ household population estimates 11/20/2000 U.S. CENSUS BUREAU 9

Characteristics of Files Included in the Medicare EDB, cont.: STARS System, cont. Recipient Address Housing Unit Proxy recipients Coverage is believed high (93-102%) but not perfect and unevenly distributed geographically Snowbird states appear to have lower ratios of medicare to 65+ population than non-snowbird states 11/20/2000 U.S. CENSUS BUREAU 10

Characteristics of Files Included in the STARS System Indian Health Service patient file: About 10 million patient/transaction records Transaction record person record Unduplication about 10 million patient records, 2 million unduplicated SSN s Many missing SSN s about 20% missing SSN s 11/20/2000 U.S. CENSUS BUREAU 11

Characteristics of Files Included in the STARS System, cont. Housing and Urban Development Tenant Rental Assistance Certification System (HUD-TRACS): HUD subsidy payments Currently, about 3.3 million records Short form data for all members of household (Race/Hispanic only for head of household) Address information may represent project or landlord address 11/20/2000 U.S. CENSUS BUREAU 12

Characteristics of Files Included in the Census NUMIDENT File: STARS System, cont. 750 million transaction records 400 million individual SSN records Post 1985: Enumeration at birth For each SSN: Date of birth, gender, race, place of birth About 50-60 million persons on the file are deceased but not identified as such No current residence information on the file Taxpayer ID Numbers (TINs) not on the file About 35% of SSN s on file have alternate names (marriage, divorce, etc.) 6% missing gender Race coding has changed (prior to 1980, 3 races: White, Black, Other); 20% either unknown or other About 25% of SSN s have transactions with different race codes 11/20/2000 U.S. CENSUS BUREAU 13

STARS Processing Diagrams Two Goals: For person data: One output record per person, assigned to an individual residence corresponding as closely as possible to Census residence definitions, in a household structure corresponding as closely as possible to Census household structure, containing microdata corresponding as closely as possible to Census short form microdata, and excluding persons which are not in the population of interest. For address data: One output record per individual housing unit at a Basic Street Address, geocoded to Census TIGER geography, with address microdata and concepts corresponding as closely as possible to DMAF address fields and concepts, and excluding locations which are not in the population of interest. 11/20/2000 U.S. CENSUS BUREAU 14

STARS Processing Overview 11/20/2000 U.S. CENSUS BUREAU 15 No Hold for next cycle 15.10 Process file this cycle? 15.05 Program Development Address Processing Program 15.15 8 Address Data Processing 15.20 10 Verified IHS File 15.55 Address Output 15.25 Go To End 15a 15 Process file this cycle? 15.05 Yes Program Development Person Editing Program 15.30 8 Person Editing 15.35 15 Program Development SSN Verification Program 15.45 8 Social Security Number (SSN) Verification 15.50 13 Create Person Characteristic File (PCF) 15.65 14 Person Characteristic File (PCF) 15.70 Program Development Household Processing Program 15.85 8 Household Data Processing 15.90 17 Program Development Final Output Program 15.100 8 Final StARS Processing 15.105 18 Data Delivery 15.115 5 No Edited IHS File 15.40 Is current year s PCF available? 15.60 Person Output 15.80 Yes Process Person Data 15.75 16 Household Output 15.95 Final StARS Output 15.110 Process file this cycle? 15.05 End

Administrative Records Experiment in 2000 (AREX 2000) Five selected sites in Maryland and Colorado MD: Baltimore city, Baltimore county; CO: El Paso county, Douglas county, Jefferson county Attempt to simulate an Administrative Records Census Not all aspects of an Administrative Records Census are simulated Group Quarters survey Coverage measurement survey Special operations not included in StARS Request for physical address (PO boxes/rr s) MAFGOR Geocoding Field verification of addresses not matched to DMAF 11/20/2000 U.S. CENSUS BUREAU 16

AREX 2000 Overview Flowchart Methods 11 and and 22 11/20/2000 U.S. CENSUS BUREAU 17 National Administrative Address Records File 17.15 Computer geocode the National File (GEO) 17.20 Create StARS 1999 from MD&CO Files (PRED) 17.35 StARS 1999 Master Housing File (MHF) for MD&CO 17.40 Copy test site records to create AREX Address File (PRED) 17.60 AREX P.O. Box and rural-style addresses (aka 2.40) 17.100 G Q Person Data from Census 17.170 Additional Geocoded Test Site Records 17.50 Receive MD&CO Files from GEO (PRED) 17.30 Perform Exploratory Data Analysis (EDA) on test sites (PRED) 17.45 Maryland & Colorado (MD&CO) Geocoded Files (with test site records flagged) 17.25 Request for Physical Addresses Mailout & Processing (DSCMO/NPC/GEO/RCCs) 17.110 2 StARS Person Data 17.175 Acquire National Administrative Records File (PRED) 17.10 Additional Ungeocoded Test Site Records 17.55 AREX Address File 17.65 Copy P.O.Box and rural-style addresses (PRED) 17.95 Update AREX Address File with Req. for Phys. Addr. results (PRED) 17.115 Extract test site records from MD&CO Files (GEO) 17.700 Extract ungeocoded city-style records (GEO) 17.75 Clerical Resolution of Ungeocoded Addresses (MAFGOR) (GEO/FLD/RCCs) 17.80 3 Update AREX Address File with MAFGOR results (PRED) 17.85 Geocoded City-style AREX Addresses 17.90 AREX Address File (after MAFGOR, Request for Physical Addresses, and Field Address Verification updates) 17.180 Start Planning & OMB Approval (PRED) 17.05 Method Method 22 Only Only (Bottom-Up) (Bottom-Up) Pull off address records from DMAF by AREX test site counties (PRED) 17.130 Match Geocoded City-style AREX Addresses to DMAF (PRED) 17.135 Unmatched Admin. Record Addresses 17.145 Field Address Verification & Processing (FLD / DSCMO / NPC) 17.150 4 Update AREX Address File with Fld. Addr. Ver. & Proc. results (PRED) 17.155 Matched Addresses 17.185 DMAF 17.120 Obtain DMAF from DSCMO (PRED) 17.125 Perform clerical review of match results (PRED) 17.140 Unmatched DMAF Addresses 17.160 Obtain person data from Census 2000 (DSCMO)17.165 Census 2000 Person Data 17.190 Post-Processing For For details, details, see see AREX AREX 2000: 2000: Administrative Administrative Records Records Research File Processing Flowcharts. 17.195

AREX 2000 Evaluation Plans g Evaluation 1: Comparison of both methods site and block level counts of population by race, Hispanic origin, age groups and gender, with comparable decennial census counts g Evaluation 2: Analyzing selected components of the AREX implementation processing g Evaluation 3: Comparison of bottom up housing unit and household level information with comparable Census 2000 housing unit and household information g Evaluation 4: Assessing the feasibility of using administrative records in lieu of a field interview to obtain data on nonresponding households 11/20/2000 U.S. CENSUS BUREAU 18

Major Analytic Issues with StARS Ontologies Processing A delivery address suitable for receiving a payment check may not suffice for putting individuals at a street address Difficult to distinguish individual units within the Basic Street Address Race coding: Hispanic Origin is a separate race on NUMIDENT Transaction data person data How many names does a person have (and in what order)? Proxies IRS & Medicare records JOHN WILSON The address is for Mary Smith. John Wilson may or C/O MARY SMITH may not live there. 1004 LAUREL LANE ROCKMONT, MD 22345 11/20/2000 U.S. CENSUS BUREAU 19

Major Analytic Issues with StARS Processing Addresses that are difficult to place on the ground Huang and Kim, 2000: About 10 % of addresses are rural style PO Boxes: 45% for IHS, 9.5% for Medicare, 7.5% for IRS 1040, 6.8% for SSS, 3.8% for IRS 1099,.4% for HUD-TRACS Sater, 1995 IRS/CPS match: 86.5% of tax return cases had the same address as residence address, 94% coded to same county John Smith H&R BLOCK P.O. BOX 12 GREENWAY, MD 29752 Addresses with both business and residential components Dean H. Judson JUDSON OLD GROWTH LOGGING & SPOTTED OWL EXTERMINATION SERVICES 45850 BACKWOODS HIGHWAY BOONDOCKS, OR 96432 11/20/2000 U.S. CENSUS BUREAU 20

Major Analytic Issues with StARS Processing, cont. Unduplication and matching When addresses or personal characteristics are measured with substantial variation, it is often not obvious whether a particular pair of records represent a duplicate or not. Yet, with multiple files, unduplication decisions must be made. CHUMS-enhanced IMH File MAF A Banana St 1 Apple St B 17 Banana St 3 Apple St Apt 1 C 19 Banana St Apt 5 3 Apple St Apt 2 D 44 MLK, Jr. Blvd 3 Apple St Apt 3 E 100 Route 4 3 Apple St Apt 4 F 7 Marie Ln 7 Apple St G Wife Mrs. Smith 9 Apple St H 5 Apple St # Apple St I 27 Apple St # Martin Luther King, Jr. Blvd J Apple St # Pennsylvania Ave K 9999 Apple St 7 Maria Ln L 3 Apple St Apt 5 M 1 Apple St N 3 Apple St Apt A O 3 Apple St ZZ P 3 Apple St Q 3 Apple St Apt 1 11/20/2000 U.S. CENSUS BUREAU 21

Major Analytic Issues with StARS Processing, cont. Outcome of "CHUMS-enhanced IMH File" / MAF Match MATCH Street BSA BSA+Unit Possible Explanations Example NO N/A N/A 1 Street is not in MAF, either it was A,B,C just missing or it's a new street 2 Different, but valid representation of D,E street name 3 Misspelling of street name F 4 Erroneous street name G YES NO N/A 1 BSA is not in MAF, either it was H just missing or it's a new BSA - There is a "hole" in MAF 2 BSA is not in MAF, either it was I just missing or it's a new BSA - A missing "street extension" 3 Existing street with no incoming J street number 4 Erroneous street number K YES YES NO 1 Unit not in MAF, either it was just L missing or it's a new unit 2 Valid match - a BSA without M separate units 3 Different representation of a unit N 4 Erroneous unit information O 5 Missing unit information P YES YES YES 1 Valid match Q 11/20/2000 U.S. CENSUS BUREAU 22

Major Analytic Issues with StARS Processing, cont. Variations in data from different sources Huang and Kim, 2000: Of the 50% of SSN s found on multiple files, about 1% have more than one gender recorded about 32% have multiple addresses about 2% have multiple races Imputation from the NUMIDENT Many files have limited microdata. For those that are found on the NUMIDENT, we can impute microdata from the approximately equivalent NUMIDENT fields. 11/20/2000 U.S. CENSUS BUREAU 23

Major Analytic Issues with StARS Processing, cont. Changing information states Distinct problem from point in time data collection Information states change over time/over databases Address information ages over time and varies over databases SAM SMITH SAM SMITH BOX 2 RURAL ROUTE 37 486 MAIN STREET WESTPORT, VA 32784 FAIRFIELD, VA 33412 (Dated 10/14/98 from Medicare) (From TY97 IRS file, filed sometime in 1998) Mortality information ages over time and varies over databases One database provides information about the other, provided that matching can be performed Data processing requires complex, and substantively important, decision logic at each step 11/20/2000 U.S. CENSUS BUREAU 24

References Bye, B (1998). Race and ethnicity modeling with SSA Numident Data: Interim report: File development and tabulations. Unpublished document available from the U.S. Bureau of the Census. Bryant, C. (1995). Comparing the LUCA address list to local records. Paper presented at the 1995 State Data Center Meeting, San Francisco, CA, April 4, 1995. Czajka, J. (1999). Can we count on administrative records in future U.S. Censuses? Presentation at the Bureau of the Census, December 15, 1999. Huang, E., and Kim, J. (2000). One Percent Sample Study Report (SRD-DRAFT). Unpublished document available from the U.S. Bureau of the Census, February 10, 2000. Judson, D.H., and Popoff, C.L. (2000). Research Use of Administrative Records. Unpublished document. Judson, Dean H. (2000). The Statistical Administrative Records System: System Design, Successes, and Challenges. Unpublished document. Kim, Myoung Ouk, and Sater, Douglas (2000). Defining the Medicare Data Universe for the U.S. Census Bureau's Population Estimates Program. Paper presented at the Southern Demographic Association meetings, New Orleans, LA, August 29, 2000. Sater, D. (1995). Differences in Location of Households and Tax Filing Units. Paper presented at the 1995 meeting of the Population Association of America, San Francisco, CA, April 6, 1995. Wand, Yair, and Wang, Richard Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39: 86-95. Zanutto, E. (1996). Estimating a population roster from an incomplete census using mailback questionnaires, administrative records, and sampled nonresponse followup. Presentation to the U.S. Bureau of the Census, August 6, 1996. Zanutto, E., and Zaslavsky, A. (1999). Using Administrative Records to Impute for Nonresponse. Paper presented at the International Conference on Survey Nonresponse, Portland, OR., October 29, 1999. 11/20/2000 U.S. CENSUS BUREAU 25