Why AI Goes Wrong And How To Avoid It Brandon Purcell June 18, 2018 2018 FORRESTER. REPRODUCTION PROHIBITED.
We probably don t need to worry about this in the near future Source: https://twitter.com/jackyalcine/status/615329515909156865 2
But this is happening today Google s response: There is still clearly a lot of work to do with automatic image labeling, and we're looking at how we can prevent these types of mistakes from happening in the future." Source: https://twitter.com/jackyalcine/status/615329515909156865 3
Companies are learning that the road to hell is paved with good intentions 4
And they are paying for it in three ways: Reputational erosion Revenue loss Regulatory fines 5
Reputational risk: the erosion of brand equity Microsoft Deletes Teen Girl' AI After It Became A Hitler-Loving Sex Robot Within 24 Hours Amazon Prime And The Racist Algorithms
"I would definitely stop doing business with any company altogether if I found out that they discriminate against anyone! 27 year old female consumer 7
Ethical failures erode shareholder value Source: Google Finance 8
Biased AI could result in severe regulatory penalties 9(1) Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation shall be prohibited. For breaches against key points of GDPR such as the basic security principles for processing data, obtaining consent and requirements to internal transfers, the higher of 4% of annual global revenue or 20,000,000 can be fined. 9
Why is this happening? 10
Models can learn three types of bias Source: https://www.forrester.com/report/the+ethics+of+ai+how+to+avoid+harmful+bias+and+discrimination/-/e-res130023 11
Algorithmic bias Source: https://www.forrester.com/report/the+ethics+of+ai+how+to+avoid+harmful+bias+and+discrimination/-/e-res130023 12
A model is only as good as the data used to train it 13
14
A crash course in machine learning Supervised learning Unsupervised learning Purpose To predict / classify To explore / understand Training data Labelled (knows the answer ) Not labelled (no right answer ) Accuracy Measurable Qualitatively evaluated Use cases for marketing Predict which customers are likely to respond / churn / buy Behavioral customer segmentation 15
The birds and the bees of model-making: supervised machine learning Unlabeled data Labeled data Training data Machine learning algorithm Classification model Validation data Final output: Newly classified data
FaceApp demonstrates the problem of algorithmic bias 17
Bad training data created a racist filter 18
Algorithmic bias is caused by unrepresentative training data Training data Total population 19
Algorithmic bias is caused by unrepresentative training data Training data Total population 20
Training data should be IID independent and identically distributed Training data Total population 21
Training data should be IID independent and identically distributed Much better! 22
What happens when historical biases are capture in the data? 23
Human bias Source: https://www.forrester.com/report/the+ethics+of+ai+how+to+avoid+harmful+bias+and+discrimination/-/e-res130023 24
Even with good training data, models can pick up on human biases Google s Word2Vec model for natural language processing is sexist Man is to woman as Computer programmer is to 25
They can be sexist Google s Word2Vec model for natural language processing is sexist Man is to woman as Computer programmer is to homemaker 26
Amazon Prime same day delivery shows the problem of human bias Rolled out in 2015 to compete with instant gratification factor of brick & mortar retailers 27 metropolitan areas Postal codes with 77 million people Excludes predominantly black postal codes in 6 major cities: Atlanta, Boston, Chicago, Dallas, New York, and Washington, D.C. 27
A tale of two cities Source: https://www.bloomberg.com/graphics/2016-amazon-same-day/ The blue shaded areas got same day delivery 28
A tale of two cities Source: https://www.bloomberg.com/graphics/2016-amazon-same-day/ 29
Amazon did not intend to exclude predominantly black postal codes Demographics play no role in it. Zero. - Craig Berman, Amazon s VP, Global Communications Model based on concentration of Prime members Inherited historical human bias in the form of redlining and de facto segregation Included variables that are a proxy for race 30
Inherited human bias perpetuates that bias in a vicious cycle Data with human bias Data Insights Models with human bias Action Discriminatory action 31
The perpetuation of human bias in the criminal justice system COMPAS - Correctional Offender Management Profiling for Alternative Sanctions 32
Can have devastating consequences Black defendants were almost twice as likely as white ones to be falsely labelled future criminals Whites more likely to be mislabeled as low risk 33
Combatting human bias requires a deep understanding of the problem and the data Are you including variables that are proxies for race, age, or other protected classes? Can you exclude these variables? Or can you modify the training data to reflect a more just outcome? 34
Code the change you want to see in the world 2018 FORRESTER. REPRODUCTION PROHIBITED. 35
Useful (intentional) bias Source: https://www.forrester.com/report/the+ethics+of+ai+how+to+avoid+harmful+bias+and+discrimination/-/e-res130023 36
But sometimes it is ok to exploit differences between customers Who should you market these items to? 37
Models help you identify and take advantage of different preferences and behaviors Good luck selling Waldo s sweater to Charlie Brown! 38
When is it ok to treat different customers differently and when isn t it? 39
Define roles and responsibilities for ensuring the ethics of algorithms Defining ethical needs to be an executive-level conversation Business units should be responsible for overseeing ethical deployment and measurement Data scientists should be the first line of defense against algorithmic bias 40
Embrace diversity by soliciting a diverse array of viewpoints Employ diverse perspectives at the data scientist, LoB, and executive levels Listen to the Voice of the Customer for their opinions Consult experts in algorithmic bias Algorithmic Justice League Joy Buolamwini University of Massachusetts at Amherst Themis IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems 41
Most importantly, make your models FAIR 42
43
Thank you Brandon Purcell bpurcell@forrester.com