Ethics of Data Science

Similar documents
Data Anonymization Related Laws in the US and the EU. CS and Law Project Presentation Jaspal Singh

AI & Law. What is AI?

FROM AI TO IA AI: Artificial Intelligence IA: Intelligence Amplification Mieke De Ketelaere, SAS NEMEA

Societal and Ethical Challenges in the Era of Big Data: Exploring the emerging issues and opportunities of big data management and analytics

BBMRI-ERIC WEBINAR SERIES #2

Why AI Goes Wrong And How To Avoid It Brandon Purcell

Privacy in a Networked World: Trouble with Anonymization, Aggregates

Prof. Roberto V. Zicari Frankfurt Big Data Lab RatSWD- February 9, 2017 Berlin

Ethical Bias in AI-Based Security Systems: The Big Data Disconnect

Challenges and opportunities of digital social research: Access and Anonymity

Privacy Policy. What is Data Privacy? Privacy Policy. Data Privacy Friend or Foe? Some Positives

Prof. Roberto V. Zicari Frankfurt Big Data Lab The Human Side of AI SIU Frankfurt, November 20, 2017

Transparency and Accountability of Algorithmic Systems vs. GDPR?

Artificial intelligence and judicial systems: The so-called predictive justice

Protecting Privacy After the Failure of Anonymisation. The Paper

Digital Health. Jiban Khuntia, PhD. Assistant Professor Business School University of Colorado Denver

Big Data & AI Governance: The Laws and Ethics

Workshop on anonymization Berlin, March 19, Basic Knowledge Terms, Definitions and general techniques. Murat Sariyar TMF

CONSENT IN THE TIME OF BIG DATA. Richard Austin February 1, 2017

Friends don t let friends deploy Black-Box models The importance of transparency in Machine Learning. Rich Caruana Microsoft Research

Executive Summary Industry s Responsibility in Promoting Responsible Development and Use:

Big Data, privacy and ethics: current trends and future challenges

AI Fairness 360. Kush R. Varshney

Systematic Privacy by Design Engineering

Health Care Analytics: Driving Innovation

A Gift of Fire: Social, Legal, and Ethical Issues for Computing Technology (Fourth edition) by Sara Baase. Term Paper Sample Topics

Ethical Machines? Ariela Tubert *

TRAINING THE NEXT GENERATION OF QUANTITATIVE BIOLOGISTS IN THE ERA OF BIG DATA

Security and Risk Assessment in GDPR: from policy to implementation

Indiana K-12 Computer Science Standards

Foundations of Privacy. Class 1

Towards Trusted AI Impact on Language Technologies

FUTURE TECHNOLOGIES FUTURE PRIVACY CHALLENGES

Surveillance and Privacy in the Information Age. Image courtesy of Josh Bancroft on flickr. License CC-BY-NC.

What are Career Opportunities if You Are Good in Math? Rafal Kulik Department of Mathematics and Statistics

Fraunhofer ISI Seite 1

UKRI Artificial Intelligence Centres for Doctoral Training: Priority Area Descriptions

Disclosure: Within the past 12 months, I have had no financial relationships with proprietary entities that produce health care goods and services.

New Age Vital Statistics Services: What They Do and Don t Do

Re-Considering Bias: What Could Bringing Gender Studies and Computing Together Teach Us About Bias in Information Systems?

Resident Application

Advances and Perspectives in Health Information Standards

RecordDNA DEVELOPING AN R&D AGENDA TO SUSTAIN THE DIGITAL EVIDENCE BASE THROUGH TIME

CCTV Policy. Policy reviewed by Academy Transformation Trust on June This policy links to: T:Drive. Safeguarding Policy Data Protection Policy

DNS Privacy, Service Management, and Research: friends or foes?

Artificial Intelligence: open questions about gender inclusion

Quantitative Reasoning: It s Not Just for Scientists & Economists Anymore

The Future of Patient Data The Global View Key Insights Berlin 18 April The world s leading open foresight program

The Onion Router: Understanding a Privacy Enhancing Technology Community

Door Prizes. Exploring Big Issues with Data in Society: Using Case Studies with Students

Building DIGITAL TRUST People s Plan for Digital: A discussion paper

Our position. ICDPPC declaration on ethics and data protection in artificial intelligence

Using AI and NLP to Alleviate Physician Burnout

CSTA K- 12 Computer Science Standards: Mapped to STEM, Common Core, and Partnership for the 21 st Century Standards

Ethics Guideline for the Intelligent Information Society

Ethical, Epistemological, Methodological, Social and Other

Human Rights in the era of Information and Communication Technology

15: Ethics in Machine Learning, plus Artificial General Intelligence and some old Science Fiction

Computational Reproducibility in Medical Research:

I ve made a new friend online. But I m worried. What do I do?

Ethical issues raised by big data and real world evidence projects. Dr Andrew Turner

Policies for the Commissioning of Health and Healthcare

Machines can learn, but what will we teach them? Geraldine Magarey

Growing the national institute for data science and artificial intelligence

Managing Technology Risks Through Technological Proficiency A Leadership Summary

Challenges to human dignity from developments in AI

Adopting Standards For a Changing Health Environment

The Alan Turing Institute, British Library, 96 Euston Rd, London, NW1 2DB, United Kingdom; 3

COMEST CONCEPT NOTE ON ETHICAL IMPLICATIONS OF THE INTERNET OF THINGS (IoT)

Privacy-Enhanced Linking

COMS 493 AI, ROBOTS & COMMUNICATION

Ethics of AI: a role for BCS. Blay Whitby

SAULT COLLEGE OF APPLIED ARTS AND TECHNOLOGY SAULT STE. MARIE, ONTARIO COURSE OUTLINE

Maximizing Innovation Funding for Technology Development. MNP SR&ED Team. Presented by: Date:

CCTV Policy. Policy reviewed by Academy Transformation Trust on June This policy links to: Safeguarding Policy Data Protection Policy

Ministry of Justice: Call for Evidence on EU Data Protection Proposals

CEOCFO Magazine. Pat Patterson, CPT President and Founder. Agilis Consulting Group, LLC

Our Letter of Intent for our Loved One

On the Diversity of the Accountability Problem

PHARMACEUTICALS: WHEN AI ADOPTION HAS GATHERED MOST MOMENTUM.

Clinical Research and HIPAA/HITECH in Practice

SUCCESSFULLY IMPLEMENTING TRANSFORMATIONAL TECHNOLOGY IN HOSPITALS AND HEALTH SYSTEMS

The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems. Overview June, 2017

Follow these instructions step by step to uncover your losses:

Table Of Contents. Introduction...p4. Day 1...p5. Day 2...p11. Day 3...p17. Day 4...p18. Day 5...p19. Day 6...p20. Day 7...p21

For more information about how to cite these materials visit

ETHICS & TRANSPARENCY IN AI. Nguyễn Hùng Sơn

Artificial Intelligence

Data, Anonymity and Consent. UKAN, September 11 th Sir Mark Walport Chief Scientific Adviser to HM Government

The Information Commissioner s response to the Draft AI Ethics Guidelines of the High-Level Expert Group on Artificial Intelligence

Media Literacy Policy

UNFAIRNESS BY ALGORITHM: DISTILLING THE HARMS OF AUTOMATED DECISION-MAKING. December 2017

Resource Review. In press 2018, the Journal of the Medical Library Association

The Future with Robots

IMPORTANT ASPECTS OF DATA MINING & DATA PRIVACY ISSUES. K.P Jayant, Research Scholar JJT University Rajasthan

How do you teach AI the value of trust?

Surveillance Technologies: efficiency, human rights, ethics Prof. Dr. Tom Sorell, University of Warwick, UK

DIMACS/PORTIA Workshop on Privacy Preserving

IT and Systems Science Transformational Impact on Technology, Society, Work, Life, Education, Training

This Privacy Policy describes the types of personal information SF Express Co., Ltd. and

Transcription:

Ethics of Data Science Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine Larry.Hunter@ucdenver.edu http://compbio.ucdenver.edu/hunter

Data Science is everywhere

What is Data Science? Machine Learning: Data-driven model selection (through a large space of possible models) I think data-scientist is a sexed up term for a statistician, -Nate Silver Our data science team brings together three things: statistics, programming, and product knowledge. -Brad Schumitsch, Amazon/Twitch

Why this could be good Algorithms for tasks that were not previously amenable to automation (e.g. image analysis) Advantages over humans doing similar tasks: Inexpensive/scalable/fast Consistent and verifiable More accurate* More fair*, less subject to social biases * Maybe. Sometimes.

Why this could be bad Data are (about) people, can cause harm Algorithmic outcomes often not explainable A lot of data incidentally produced by daily life: Social media Ubiquitous cameras, microphones, location tracking Medical treatment Important new uses Legal: Surveillance, predictive policing, sentencing, fraud detection, military applications Economic: School admissions, hiring/promotion, loans, insurance, accounting controls, advertising Medical: Health insurance, diagnosis, decision making

Some ethical concerns Preserving privacy Methods for handling sensitive data Uses of data science that undermine privacy Avoiding bias Data selection and unintentional red-lining Re-inscription of existing biases Mitigating malicious attacks Intentional subversion of machine learning systems Hazards of learning from the open internet

Anonymized data isn t always In 1997, Latanya Sweeney identified the Governor of Massachusetts medical records. Massachusetts released hospital records anonymized by removing names, addresses and SSNs Voter records have name, address, ZIP code, birth date, and sex of every voter Sweeney used zip code, birthdate and gender to uniquely identify Weld s records 87% of US identified by zip, birthdate & gender Similar with Netflix (using IMDB) and search logs

Privacy Security Some data cannot be anonymized Genome sequences are inherently identifying Even a few hundred well-picked SNPs Often, people s desires about their data involve questions of trust Willingness to share medical data with academic researchers, but not pharmaceutical companies Privacy is not a binary value Different sorts of exposure to different sorts of people evoke different responses

Privacy and technology Privacy preserving technologies: K-anonymity Ignorant processing Privacy invading technologies: Identifying people and their locations by cellphone metadata Descrambling pixelated images: Defeating Image Obfuscation with Deep Learning McPhearson, et al. 2016

Data Sharing Data sharing can be of great scientific value Often, data generators control (no sharing) New models emerging requiring more sharing Genomics / sequences Large NIH grants Clinical trials? Participants are surprised it doesn t happen

Data Science and Bias Objective algorithms are thought to be free of the biases that plague people. Algorithms, especially ones that learn, can inadvertently re-inscribe those biases Algorithms are opaque, hard to interrogate Increasingly widespread

Discrimination and its proxies Illegal, and generally perceived as wrong to make choices based on race, gender, religion, national origin, etc. However, proxies for these are everywhere: Zip codes Names (gender, race, national origin) Purchase histories (including movies or tv shows) Machine learning that uses biased historical record + any proxy is likely to re-inscribe bias

Discrimination in Online Ad Delivery Sweeney observed in 2013 that blackidentifying names turned out to be much more likely than white-identifying names to generate ads that including the word arrest (60 per cent versus 48 per cent). Google uses a learning algorithm to place ads that are most often clicked on. Likely to be a reflection of people clicking on those ads more for black names

Adversarial environments Since there is a lot riding on algorithms, people have an interest in manipulating them. Many effective strategies

Beware the open internet Tay was a chatbot designed last year by Microsoft to interact with people over Twitter Built by "mining relevant public data" and combining that with input from editorial staff, "including improvisational comedians." The bot is supposed to learn and improve as it interacts with users Within 24 hours of being unveiled, it was pulled after making many racist, sexist, etc. statements As it learns, some of its responses are inappropriate and indicative of the types of interactions some people are having with it. We're making some adjustments to Tay."

1. Acknowledge that data are people and can do harm 2. Recognize that privacy is more than a binary value 3. Guard against the re-identification of your data 4. Practice ethical data sharing 5. Consider the strengths and limitations of your data; big does not automatically mean better 6. Debate the tough, ethical choices 7. Develop a code of conduct for your organization, research community, or industry 8. Design your data and systems for auditability 9. Engage with the broader consequences of data and analysis practices 10. Know when to break these rules

Let s discuss