Towards a Modern Approach to Privacy-Aware Government Data Releases

Similar documents
Towards a New Ethical and Regulatory Framework for Big Data Research

Ethics Guideline for the Intelligent Information Society

Biometric Data, Deidentification. E. Kindt Cost1206 Training school 2017

Protection of Privacy Policy

ISO/IEC INTERNATIONAL STANDARD. Information technology Security techniques Privacy framework

Australian Census 2016 and Privacy Impact Assessment (PIA)

Violent Intent Modeling System

Personal Data Protection Competency Framework for School Students. Intended to help Educators

EXIN Privacy and Data Protection Foundation. Preparation Guide. Edition

CONSENT IN THE TIME OF BIG DATA. Richard Austin February 1, 2017

Robert Bond Partner, Commercial/IP/IT

Our position. ICDPPC declaration on ethics and data protection in artificial intelligence

Privacy Policy SOP-031

Is Privacy Still an Issue for Data Mining? Chris Clifton 11 October, 2007

Children s rights in the digital environment: Challenges, tensions and opportunities

Privacy engineering, privacy by design, and privacy governance

IAB Europe Guidance THE DEFINITION OF PERSONAL DATA. IAB Europe GDPR Implementation Working Group WHITE PAPER

PRIVACY ANALYTICS WHITE PAPER

Global Alliance for Genomics & Health Data Sharing Lexicon

Executive Summary Industry s Responsibility in Promoting Responsible Development and Use:

ISO/IEC INTERNATIONAL STANDARD. Information technology Security techniques Privacy framework

Should privacy impact assessments be mandatory? David Wright Trilateral Research & Consulting 17 Sept 2009

LAB3-R04 A Hard Privacy Impact Assessment. Post conference summary

mathematics and technology, including through such methods as distance

Towards a Magna Carta for Data

2018 Federal Scientists Survey FAQ

Information Communication Technology

Innovation and Technology Law Curriculum

ISO/TR TECHNICAL REPORT. Intelligent transport systems System architecture Privacy aspects in ITS standards and systems

Lecture 7 Ethics, Privacy, and Politics in the Age of Data

Elements in decision making / planning 4 Decision makers. QUESTIONS - stage A. A3.1. Who might be influenced - whose problem is it?

Castan Centre for Human Rights Law Faculty of Law, Monash University. Submission to Senate Standing Committee on Economics

Report to Congress regarding the Terrorism Information Awareness Program

Big Data, privacy and ethics: current trends and future challenges

LAW ON TECHNOLOGY TRANSFER 1998

Digital transformation in the Catalan public administrations

Managing Technology Risks Through Technological Proficiency A Leadership Summary

Deviational analyses for validating regulations on real systems

Impacts and Risks Caused by AI Networking, and Future Challenges

ISACA Privacy Principles and Program Management Guide. Yves LE ROUX CISM, CISSP ISACA Privacy TF Chairman. Insert Date Here

National approach to artificial intelligence

4. A set of morally reasonable expectations about the governance and use of data should be determined in accordance with four principles:

BOTSWANA INTERNET GOVERNANCE FORUM (IGF) DISCUSSION PAPER

CCTV Policy. Policy reviewed by Academy Transformation Trust on June This policy links to: Safeguarding Policy Data Protection Policy

Regulating by Robot and Adjudicating by Algorithm:

How do you teach AI the value of trust?

Ministry of Justice: Call for Evidence on EU Data Protection Proposals

Ties That Bind. Organisational Security for Civil Society. Executive Summary

Jerry Reiter Department of Statistical Science Information Initiative at Duke Duke University

This research is supported by the TechPlan program funded by the ITS Institute at the University of Minnesota

Fraunhofer ISI Seite 1

Whatever Happened to the. Fair Information Practices?

Information and Communications Technology and Environmental Regulation: Critical Perspectives

Justice Select Committee: Inquiry on EU Data Protection Framework Proposals

Privacy by Design: Research and Action. Deirdre K. Mulligan

responsiveness. Report. Our sole Scope of work period; Activities outside the Statements of future Methodology site level); Newmont; 3.

Capstone Design. The Law of Unintended Consequences

Towards Trusted AI Impact on Language Technologies

The Blockchain Ethical Design Framework

Human Rights Grievance Mechanisms and Remedies

CCTV Policy. Policy reviewed by Academy Transformation Trust on June This policy links to: T:Drive. Safeguarding Policy Data Protection Policy

Presentation Outline

Artificial intelligence and judicial systems: The so-called predictive justice

Privacy-Preserving Collaborative Recommendation Systems Based on the Scalar Product

Analysis of Privacy and Data Protection Laws and Directives Around the World

Mobilisation and Mutual Learning (MML) Action Plans on Societal Challenges

COMEST CONCEPT NOTE ON ETHICAL IMPLICATIONS OF THE INTERNET OF THINGS (IoT)

510 Data Responsibility Policy

Measuring animal welfare in a global context

Pan-Canadian Trust Framework Overview

Impacts of the circular economy transition in Europe CIRCULAR IMPACTS Final Conference Summary

Guidance on the anonymisation of clinical reports for the purpose of publication in accordance with policy 0070

This policy sets out how Legacy Foresight and its Associates will seek to ensure compliance with the legislation.

28 TH INTERNATIONAL CONFERENCE OF DATA PROTECTION

CODE OF CONDUCT. STATUS : December 1, 2015 DES C R I P T I O N. Internal Document Date : 01/12/2015. Revision : 02

What s New in Open Data Assessment Tools

Workshop on anonymization Berlin, March 19, Basic Knowledge Terms, Definitions and general techniques. Murat Sariyar TMF

THE 4 th ZAMBIA ALTERNATIVE MINING INDABA

Q1 Under the subject "Future of Work and the New Economy", which topics do you find important?

Exploring emerging ICT-enabled governance models in European cities

Data Protection and Ethics in Healthcare

BBMRI-ERIC WEBINAR SERIES #2

UNITED NATIONS EDUCATIONAL, SCIENTIFIC AND CULTURAL ORGANIZATION

The Information Commissioner s response to the Draft AI Ethics Guidelines of the High-Level Expert Group on Artificial Intelligence

Hong Kong Personal Data Protection Regulatory Framework From Compliance to Accountability

Fiscal 2007 Environmental Technology Verification Pilot Program Implementation Guidelines

Getting the evidence: Using research in policy making

Media Literacy Policy

Responsible Data Use Policy Framework

Selecting, Developing and Designing the Visual Content for the Polymer Series

Technical Meeting on Stakeholder Involvement across the Nuclear Power Plant Life Cycle

Engaged and responsible universities shaping Europe

Diana Gordick, Ph.D. 150 E Ponce de Leon, Suite 350 Decatur, GA Health Insurance Portability and Accountability Act (HIPAA)

Best Practice and Minimum Standards in Digital Preservation. Adrian Brown, UK Parliament Oracle PASIG, London, 5 April 2011

Documentary Heritage Development Framework. Mark Levene Library and Archives Canada

networked Youth Research for Empowerment in the Digital society MANIFESTO

12 April Fifth World Congress for Freedom of Scientific research. Speech by. Giovanni Buttarelli

GDPR Awareness. Kevin Styles. Certified Information Privacy Professional - Europe Member of International Association of Privacy professionals

Metrology in the Digital Transformation

The UK Prevention Research Partnership (UKPRP): Vision, objectives and rationale

Transcription:

Towards a Modern Approach to Privacy-Aware Government Data Releases Micah Altman David O Brien & Alexandra Wood MIT Libraries Berkman Center for Internet & Society Open Data: Addressing Privacy, Security, and Civil Rights Challenges 19th Annual BCLT/BTLJ Symposium April 2015

Disclaimer These opinions are our own. They are not the opinions of MIT, Brookings, Berkman any of the project funders, nor (with the exception of co-authored previously published work) our collaborators. 2

Collaborators The Privacy Tools for Research Data Project <privacytools.seas.harvard.edu> Research Support from Sloan Foundation; National Science Foundation (Award #1237235); Microsoft Corporation 3

Related Work Vadhan, S., et al. 2011. Re: Advance Notice of Proposed Rulemaking: Human Subjects Research Protections. Altman, M., D. O Brien, S. Vadhan, A. Wood. 2014. Big Data Study: Request for Information. O'Brien, et al. 2015. When Is Information Purely Public? (Mar. 27, 2015) Berkman Center Research Publication No. 2015-7. Wood, et al. 2014. Long-Term Longitudinal Studies (July 22, 2014). Berkman Center Research Publication No. 2014-12. Preprints and reprints available from: informatics.mit.edu 4

Goals 1. Examine critical use cases 2. Develop a framework for systematically analyzing privacy in releases of data 3. Produce a guide for selecting among new legal and technical tools for privacy protection 5

Use Cases for Government Data Releases Freedom of Information Act/Privacy Act Traditional Public and Vital Records Official Statistics Open Government/E-Government Initiatives 6

Public Release of Workplace Injury Records 7

8

Benefits from Public Data Availability Transparency as a democratic principle Accountability of institutions Economic and social welfare benefits Data for research and scientific progress 9

Scope of Information Made Public All collected data not protected by FOIA, the Privacy Act, or OSHA reporting regulations Redaction of names, addresses, dates of birth, and gender Information to be released includes job title, date and time of incident, and descriptions of injury or illness and where and how it occurred 10

OSHA rulemaking mockup of proposed web display of injury/illness reports 11

Unaddressed Challenges and Risks Re-identification Risks Individuals can be identified despite redaction of directly identifying fields or attributes Robust de-identification of microdata is a very difficult problem, and free-form text fields are especially challenging Information Sensitivity OSHA identifies privacy concern cases as injuries or illnesses related to sexual assault, mental health, or infectious diseases There are other situations in which details regarding an injury or illness may be sensitive, such those related to drug or alcohol abuse, that are not included 12

Unaddressed Challenges and Risks Review, Reporting, and Accountability Lack of review mechanisms, such as systematic redactions of sensitive information before release Lack of accountability for harm arising from misuse of disclosed data 13

Framework for Modern Privacy Analysis 14

Observation 1 Privacy is not a simple function of the presence or absence of specific fields, attributes, or keywords in a released set of data. Other factors, including what one can learn or infer about individuals from a data release as a whole or when linked with other information, may lead to harm. 15

Observation 2 Redaction, pseudonymization, coarsening, and hashing, are often neither an adequate nor appropriate practice, and releasing less information is not always a better approach to privacy. Simple redaction of information that has been identified as sensitive is often not a guarantee of privacy protection and may also reduce the usefulness of the information. In addition, the act of redacting certain fields of a record may reveal the fact that a record contains sensitive information. 16

Observation 3 Naïve use of any data sharing model, including a more advanced model, is unlikely to provide adequate protection. Thoughtful analysis with expert consultation is necessary in order to evaluate the sensitivity of the data collected, to quantify the associated re-identification risks, and to design useful and safe release mechanisms. 17

Framework for Privacy Analysis Benefits from data availability Scope of information made available Re-identification (learning) risks Information sensitivity (harm in context) Information transformation (aggregation, redaction) Post-disclosure control mechanisms: review, reporting, and information accountability 18

Privacy Interventions at Any Stage 19

Data Sharing Models 20

Types and Targeting Interventions Procedural Economic Informational Legal Mechanisms Technical Mechanisms Acceptance Retention Transformation Access Post-Access 21

Where do proposed interventions fit? Acceptance Retention Procedural Economic Informational Legal Mechanisms Informed consent Right to examine PBD #2, 3,7 Property RRights Assignm ent; Informed consent p Transformation Right to PBD r Safe harbor correct #4,7 Fees; a c Access Restrictions Breach Fines t on use PBD i reporting #6, c Post-Access Tethering 78 e Individual right of action G o o d Technical Mechanisms Encryption -Based Data Blurring Formal Policies PDS 22

Technical Approaches: Statistical & Computational Contingency tables Synthetic data Data visualizations Interactive mechanisms Multiparty computations Functional and homomorphic encryption 23

Technical Approaches: Information Security Access controls (including tiered access models) Secure data enclaves Personal data stores Audit systems Information accountability/operational policy Risk assessments 24

Legal & Regulatory Approaches Notice and consent Data sharing agreements Transparency and audit requirements Data minimization requirements Accountability for misuse, including civil and criminal penalties and private rights of action 25

Observation 4 Current approaches to evaluating risk and data utility and selecting appropriate controls is largely ad-hoc and inconsistent across organizations and sectors. 26

Risk and Harm Identifiability (learning potential) Information Sensitivity Direct identifiers Quasi-identifiers (personal, externally readily observable characterisics) Indirect-linkages Statistical reidentification risk Individual learning risk Social learning risks Types of harms: e.g., loss of insurability, loss of employability, market discrimination, criminal liability, psychological harm, loss of reputation, emotional harm, and loss of dignity (dignitary harm); social harms to a vulnerable group (e.g., stereotyping), price discrimination against vulnerable groups, market failures; chilling of speech and action; potential for political discrimination; potential blackmail and other abuses) Expected magnitude of harm, if identification occurs (e.g., minimal, moderate, severe) Number of people exposed to harm Towards a Modern Approach to Privacy-Aware Government Data Releases 27

Selecting Controls: Risk & Harm Factors 28

Information Factors Data Structure Logical Structure (e.g., single relation, multiple relational, network/graph, semi-structured, geospatial, aggregate table) Unit of observation Attribute measurement type (e.g., continuous/discrete; ratio/interval/ordinal/nominal scale; associated schema/ontology) Performance characteristics (e.g., dimensionality/number of measures, number of observation/volume, sparseness, heterogeneity/variety, frequency of updates/velocity) Quality characteristics (e.g., measurement error, metadata, completeness Analysis Type Towards a Modern Approach to Privacy-Aware Government Data Releases Form of output (e.g., summary scalars, summary table, model parameters, data extract, static data publication, static visualization, dynamic visualization, statistical/model diagnostics) Analysis methodology (e.g., contingency tables/counting queries, summary statistics/function estimation, regression models/glm, general model-based statistical estimation/mle/mcmc, bootstraps/randomization/data partitioning, data mining/heuristics/custom algorithms) Analysis goal (e.g., rule-based, theory formation, existence proof, verification, descriptive inference, forecasting, causal inference, mechanistic inference) Utility/loss/quality measure (e.g., entropy, mean squared error, realism, validity of descriptive/predictive/causal statistical inference) 29

Stakeholder Factors Disclosure Scenarios Stakeholders Source of threat (e.g., natural, unintentional, intentional) Areas of vulnerability (e.g., data, software, logistical, physical, social engineering) Attacker objectives, background knowledge, and capability (e.g., nosy neighbor, business competitor, muckraking journalist, panopticon, intrusive employer/insurer ) Breach criteria/disclosure concept Stakeholder types (e.g., consumer, producer, funder, host institution, researcher, regulator, subject, citizen, journal) Stakeholder capacities/resources (e.g., technical expertise, infrastructural capacity, budget, staffing resources) Trust relationships Incentives and payoffs Stakeholder range of actions in each lifecycle stage Towards a Modern Approach to Privacy-Aware Government Data Releases 30

Selecting Controls: OSHA Example Tiered access model with embedded review, audit, and accountability mechanisms Public access to contingency tables and data visualizations, for a quick review and comparison of different employers Interactive query access via a privacy-aware model server, for enabling access to more fine-grained information Restricted access to raw data via a secure data enclave, subject to data use agreement, for vetted researchers 31

References Salil Vadhan, et al., Comments to the Department of Health and Human Services and the Food and Drug Administration, Re: Advance Notice of Proposed Rulemaking: Human Subjects Research Protections, Docket No. HHS-OPHS-2011-0005 (Oct. 26, 2011), available at http://privacytools.seas.harvard. edu/files/commonruleanprm.pdf. Micah Altman, David O Brien, & Alexandra Wood, Comments to the Occupational Safety and Health Administration, Re: Proposed Rule: Improve Tracking of Workplace Injuries and Illnesses, OSHA-2013-0023-1207 (March 10, 2014), available at http://www.regulations.gov/#%21documentdetail;d=osha-2013-0023-1207. Micah Altman, David O Brien, Salil Vadhan, & Alexandra Wood, Comments to the White House Office of Science and Technology Policy, Re: Big Data Study; Request for Information (March 31, 2014), available at http: //privacytools.seas.harvard.edu/files/whitehousebigdataresponse1.pdf. David O Brien, et al., Integrating Approaches to Privacy Across the Research Lifecycle: When Is Information Purely Public?, Berkman Center Research Publication No. 2015-7 (March 27, 2015), available at http://ssrn. com/abstract=2586158 or http://dx.doi.org/10.2139/ssrn.2586158. Alexandra Wood, et al., Integrating Approaches to Privacy Across the Research Lifecycle: Long-Term Longitudinal Studies, Berkman Center Research Publication No. 2014-12 (July 22, 2014), available at http://ssrn. com/abstract=2469848 or http://dx.doi.org/10.2139/ssrn.2469848. 32

Questions E-mail: Micah Altman, escience@mit.edu Web: privacytools.seas.harvard.edu 33