A GI Science Perspective on Geocoding:

Similar documents
Influence of street reference data on geocoding quality

Lecture 8 Geocoding. Dr. Zhang Spring, 2017

Reverse geocoding and implica1ons for geospa1al privacy. Paul Zandbergen Department of Geography University of New Mexico

Improving the Quality of Geocoded Data

Central Cancer Registry Geocoding Needs

Chapter 10. What is geocoding?

Accuracy and Precision of the NAACCR Geocoder. Recinda L Sherman, MPH CTR David J Lee, PhD University of Miami, Florida Cancer Data System

A COMPARISON OF GEOCODING BASELAYERS FOR ELECTRONIC MEDICAL RECORD DATA ANALYSIS

geocoding crime data in Southern California cities for the project, Crime in Metropolitan

GIS Lecture 8: Geocoding

CRA Wiz & Fair Lending Wiz Geocoding Basics. August 2017

Geocoding DoubleCheck: A Unique Location Accuracy Assessment Tool for Parcel-level Geocoding

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Geocoding Techniques and Options for US and International Locations

Geocoding Address Data & Using Geocoded Data

Designing Service Coverage and Measuring Accessibility and Serviceability

GIS Data Sources. Thomas Talbot

A method and a tool for geocoding and record linkage

2020 CENSUS LOCAL UPDATE OF CENSUS ADDRESSES OPERATION (LUCA) U.S. Census Bureau Geography Division

Lessons from a Pilot Study for a National Probability Sample Survey of Chinese Adults Focusing on Internal Migration

A Probabilistic Geocoding System based on a National Address File

Experiences with the Use of Addressed Based Sampling in In-Person National Household Surveys

Postal Codes OM by Federal Ridings File (PCFRF) 2013 Representation Order, Reference Guide

MAPS & ENHANCED CONTENT

Geocoding: Acquiring Location Intelligence to Make Be er Business Decisions

On the suitability of Volunteered Geographic Information for the purpose of geocoding

Lecture 8: GIS Data Error & GPS Technology

Geocoding and Address Matching

Statewide State Roads Layer Michigan Geographic Framework Field Definitions

EXPERT GROUP MEETING ON CONTEMPORARY PRACTICES IN CENSUS MAPPING AND USE OF GEOGRAPHICAL INFORMATION SYSTEMS New York, 29 May - 1 June 2007

An ESRI White Paper May 2009 ArcGIS 9.3 Geocoding Technology

Propagation Modelling White Paper

VGIN Geocoding Service

Analysis and Geoprocessing Sessions and Demo Theater Presentations

In-Office Address Canvassing for the 2020 Census: an Overview of Operations and Initial Findings

The ONS Longitudinal Study

National Longitudinal Study of Adolescent Health. Public Use Contextual Database. Waves I and II. John O.G. Billy Audra T. Wenzlow William R.

ADJACENT BAND COMPATIBILITY OF TETRA AND TETRAPOL IN THE MHZ FREQUENCY RANGE, AN ANALYSIS COMPLETED USING A MONTE CARLO BASED SIMULATION TOOL

Overview of Census Bureau Geographic Areas and Concepts

Economic and Social Council

Using Location-Based Services to Improve Census and Demographic Statistical Data. Deirdre Dalpiaz Bishop May 17, 2012

ARCGIS DESKTOP DEMO (GEOCODING, SERVICE AREAS, TABULAR & SPATIAL JOINS)

QualityStage AVI+Geo+US Census+UK PAF v10.5 Output as of 2015 Q3 AVI Release

ArcGIS Tutorial: Geocoding Addresses

Geographic Terms. Manifold Data Mining Inc. January 2016

National Census Geography Some lessons learned and future challenges in European countries

Postal Code Conversion for Data Analysis

Transforming the Census

Project summary. Key findings, Winter: Key findings, Spring:

Saint Lucia Country Presentation

Eastlan Ratings Radio Audience Estimate Survey Methodology

THE TOP 100 CITIES PRIMED FOR SMART CITY INNOVATION

The Road to 2020 Census

Version 2.2 April Census Local Update of Census Addresses Operation (LUCA) Frequently Asked Questions

2012 AMERICAN COMMUNITY SURVEY RESEARCH AND EVALUATION REPORT MEMORANDUM SERIES #ACS12-RER-03

Pitney Bowes Software Geocoding: Capabilities and Roadmap

2010 Census Mapping Evolution, Potentialities and Integration to the National Spatial Data Infrastructure

RAPTORXR. Broadband TV White Space (TVWS) Backhaul Digital Radio System

Analysis Techniques for WiMAX Network Design Simulations

Traffic Management for Smart Cities TNK115 SMART CITIES

2020 Census Local Update of Census Addresses. Operation (LUCA) Promotion

The 2020 Census Geographic Partnership Opportunities

US Census. Thomas Talbot February 5, 2013

On-site Traffic Accident Detection with Both Social Media and Traffic Data

The American Community Survey. An Esri White Paper August 2017

FREMONT COUNTY. APPLICATION FOR ZONE CHANGE #2 USE DESIGNATION PLAN (Requires Subsequent Approval of ZC #2 Final Designation Plan) 1.

The 2020 Census Geographic Partnership Opportunities

Regional management of underwater noise made possible: an achievement of the BIAS project

CHAPTER 3. Public Schools Facility Element

How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory

Comparison of Simulation-Based Dynamic Traffic Assignment Approaches for Planning and Operations Management

2020 Census Local Update of Census Addresses Operation (LUCA)

Prepared for: CACI Acorn microsite Prepared by: CACI Product Development Team Date issued: 15th March Acorn technical document

3 Economic Development

CHAPTER 11 PRELIMINARY SITE PLAN APPROVAL PROCESS

Geocoding regional and remote poor quality address records with confidence

INTEGRATED COVERAGE MEASUREMENT SAMPLE DESIGN FOR CENSUS 2000 DRESS REHEARSAL

Harnessing Census Microdata

Daniel R. Cahoy Smeal College of Business Penn State University VALGEN Workshop January 20-21, 2011

SMART PLACES WHAT. WHY. HOW.

Event History Calendar (EHC) Between-Wave Moves File. Codebook

LOCATION PRIVACY & TRAJECTORY PRIVACY. Elham Naghizade COMP20008 Elements of Data Processing 20 rd May 2016

Coastside Fire Protection District

Participant Statistical Areas Program for the 2010 Census. Vince Osier COPAFS Quarterly Meeting Washington, DC December 8, 2006

NCRIS Capability 5.7: Population Health and Clinical Data Linkage

GPS Accuracy in Urban Environments Using Post-Processed CORS Data

2020 Census Local Update of Census Addresses. Operation (LUCA) Promotion

2020 Population and Housing Census Planning Perspective and challenges for data collection

Green/Blue Metrics Meeting June 20, 2017 Summary

Economic and Social Council

Clustering of traffic accidents with the use of the KDE+ method

Huawei response to the Ofcom call for input: Fixed Wireless Spectrum Strategy

ONLINE APPENDIX: SUPPLEMENTARY ANALYSES AND ADDITIONAL ESTIMATES FOR. by Martha J. Bailey, Olga Malkova, and Zoë M. McLaren.

ESP 171 Urban and Regional Planning. Demographic Report. Due Tuesday, 5/10 at noon

Adopted March 17, 2009 (Ordinance 09-15)

DANE COUNTY ORDINANCE AMENDMENT NO: Internal Tracking Number: RECU25846

Confidently Assess Risk Using Public Records Data with Scalable Automated Linking Technology (SALT)

AN OVERVIEW OF THE STATE OF MARINE SPATIAL PLANNING IN THE MEDITERRANEAN COUNTRIES MALTA REPORT

Global Position Paper on Fishery Rights-Based Management

ECONOMIC ELEMENT. of the PINELLAS COUNTY COMPREHENSIVE PLAN. Prepared By: The Pinellas County Planning Department. as staff to the

Transcription:

A GI Science Perspective on Geocoding: Accuracy, Repeatability and Implications for Geospatial Privacy Paul A Zandbergen Department of Geography University of New Mexico

Geocoding as an Example of Applied GI Science Why geocoding? Arguably the most successful application of GISc In very widespread use Fundamental step in many types of spatial analysis Often considered as relatively easy and largely error-free Big questions: How good is current geocoding? What are the minimum quality expectations for geocoding? What are the effects of errors in geocoding on spatial analysis? What are the (unintended) consequences of widespread, highquality geocoding? How can geocoding be improved?

Types of Geocoding County ZIP code (ZIP and ZIP+2/4) Street Parcel Address point Routes Types of geocoding will vary by country and region, but in the United States street geocoding is by far the most widely used, both for research and commercial applications

What s Behind the Geocoding Process? Probabilistic Record Linkage Standardization Soundex Well-established algorithms, dating back to early efforts by US Census (DIME, TIGER) Ongoing research into modifications, including Markov-chain models

How Does Street Geocoding Work? End Offset (%) 747 Main St 701 Side Offset 799 700 798 1. Find the zone (ZIP, City, etc.) 2. Match the street ( by Name, Type, Dir, etc.) 3. Match the segment with the proper range 4. Linear interpolation along segment 5. Apply offsets

Typical Errors Spelling issues Incomplete street address Prefix, suffix, direction conflicts Apartment and unit numbers Ambiguous street names Outdated/incomplete street reference data Well documented types of errors

Framework for Geocoding quality 1. Completeness The percentage of records that could be reliably matched 2. Positional accuracy The difference between the geocoded location and the true location 3. Repeatability/robustness Agreement between results from repeated geocoding

What do we know? Completeness Surprisingly little research Most research makes ad-hoc decisions on what is acceptable Positional accuracy Errors can be substantial, but distribution not well characterized Larger error in rural areas Repeatability/robustness Limited research Effect of different geocoding algorithms appears limited Quality of street reference data the determining factor Potential trade- offs: e.g. increase match rate while sacrificing positional accuracy

Research Case Studies Data sets School children and schools in Orlando County Sex offenders in Florida Banks and grocery stores in Florida Fishing and boat license holders in Florida Analyses Positional accuracy Repeatability Effects of positional error on spatial analysis Comparison of alternatives

Maple Street NE

Left range: 401-453 Right range: 400-460

441 Maple St NE

GoogleMaps: 441 Maple St NE

MapQuest: 441 Maple St NE

Error Measurement Tool Fishbone Tool

Results get messy sometimes

The Squeeze Effect

Typical Error Distribution 100 90 80 Cumulative Frequency (%) 70 60 50 40 30 N =104,865 school children in Orange County, FL 20 10 0 0 200 400 600 800 1,000 Positional Error (m) Zandbergen, P.A. 2007. Influence of geocoding quality on environmental exposure assessment of children living near high traffic roads. BMC Public Health, 7:37.

Typical Error Estimates Statistic Value (m) Log-normalized Q-Q plot Min ~0 4 Max 32,356 Mean 66 2 SD Median 90 th % 95 th % 435 41 100 137 Expected Normal 0-2 99 th % N 373 104,865-4 0 1 2 3 Observed Value Based on the 90 th percentile, typical street geocoding does not meet the accuracy standards for a 1:100,000 scale map based on National Map Accuracy Standards! Zandbergen, P.A. 2007. Influence of geocoding quality on environmental exposure assessment of children living near high traffic roads. BMC Public Health, 7:37.

Quality of Street Reference Data

Use of Offsets Small offsets are commonly used Very minor effect on positional accuracy Optimum ~25 meters

Effect of Density and Parcel Size 100 90 80 Cumulative Number of Locations (%) 70 60 50 40 30 20 10 Urban Residential - Parcel < 1 acre Urban Residential - Parcel > 1 acre Rural Residential - Parcel < 1 acre Rural Residential - Parcel > 1 acre 0 0 100 200 300 400 500 600 700 800 900 1000 Positional Error (feet) Zandbergen, P.A. Spatial variability in the positional accuracy of street geocoding. International Journal of Geographical Information Science (under review).

Repeatability How do geocoding results (match rate and positional accuracy) vary by: Geocoding algorithms Street reference data Quality of street reference data is the most significant factor Local street centerlines and/or E-911data are typically superior How do commercial firms compare to results from GIS Analysts? Again, quality of street reference data dominates, which for most commercial firms is good (unless they use TIGER 2000 data) Substantial variability among providers No relationship between cost and quality

Volusia County, FL 4 land use types 3 street datasets Zandbergen, P.A. Repeatability of street geocoding. Computers and Geosciences (under review).

Example: Sex Offender Residency Restrictions Zandbergen, P.A. and T.C. Hart. 2006. Reducing housing options for convicted sex offenders: Investigating the impact of residency restriction laws using GIS. Justice Research and Policy, 8:1-24.

Errors in Classification Street geocoding of schools and offenders In Out Total Parcel geocoding of schools and offenders In Out 55 5 103 460 158 465 Total 60 563 623 False positives False negatives

Mapping Residency Restrictions: Parcel-level Zandbergen, P.A. and T.C. Hart. 2006. Reducing housing options for convicted sex offenders: Investigating the impact of residency restriction laws using GIS. Justice Research and Policy, 8:1-24.

Example: Schools and School Children in Proximity to High Traffic Roads

Error and Bias in Analysis Results Zandbergen, P.A. 2007. Influence of geocoding quality on environmental exposure assessment of children living near high traffic roads. BMC Public Health, 7:37.

Alternatives to Street Geocoding Street geocoding is dominant in the US Techniques used in other jurisdictions: Postal code (Canada) Address points (Australia and UK) Two alternatives for the US: Parcel boundaries Address points Both produce exceptional positional accuracy, but application currently limited by: Data availability / consistency Unproven match rates

Match rate results Match rates for different databases for Bay County, Fl 100 90 80 70 60 50 40 Address Parcel Roads 30 20 10 0 Commercial Banks Daycares Elevators Fishing Licenses Grocery Stores Sex Offenders Zandbergen, P.A. A comparison of address point, parcel and street geocoding techniques. Computers, Environment and Urban Systems (under review).

Error Propagation Modeling Not very well developed, for something as simple and common as street geocoding Monte-Carlo simulation of variability in: Match rate Positional accuracy Determine effects of error/bias on: Clustering Proximity analysis Neighborhood assignment Contribute to standards and procedures for geocoding In development.

Geospatial Privacy Collection of individual level data is growing rapidly in public and private sector Analysis of the individual level is very attractive for researches (overcomes MAUP and ecological fallacy issues) Spatial identifiers have not received the same level of concern as individual identifiers (like name) As geocoding becomes easier, cheaper and more accurate, so does the ability to reverse geocode spatial data Geospatial privacy has been recognized in the literature, but limited formal requirements or guidelines exist that recognize the increasing availability and capabilities of geospatial tools

Reverse Geocoding

Geographic Masking Traditional approach to protect privacy is through geographic masking typically a random perturbation Masking may alter data in undesirable ways effects of masking on spatial analysis has not received a lot of attention How much masking is needed to adequately protect geospatial privacy is poorly understood

Tracking Technologies Old-fashioned radio telemetry Traditional GPS GPS enabled cellphones Hybrid GPS / Wi-Fi Widespread adoption of (real-time) tracking is eminent

GPS-enabled Cellphones

Real-time Tracking

Google Earth Plug-ins

Real-time Online Tracking

Future Developments Address point geocoding will grow in importance and become the standard in the US Individual-level data will become a more widespread unit of analysis Tracking technologies will start to make a major impact Geospatial privacy protection needs to catch up to current trends

Elements of my research agenda Ongoing: Geocoding quality (completeness, positional accuracy, repeatability) Error propagation modeling Reverse geocoding and geographic masking Reliability of tracking technologies (i.e. cellphones vs. WiFi vs. GPS) and implications for geocoding Planned for: Spatial-temporal geocoding

Spatial-temporal Geocoding Where people spend their time not just where their mailing address is. Short-term: Where people live, work, shop etc. How people spend their day/week Long-term: Migration over years/decades How people spend their life Applications: transportation, emergency management, urban planning, environmental health How do we do this technically, practically and ethically?