Focus Group on Artificial Intelligence for Health

Similar documents
Our position. ICDPPC declaration on ethics and data protection in artificial intelligence

OECD WORK ON ARTIFICIAL INTELLIGENCE

The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems. Overview June, 2017

Executive Summary Industry s Responsibility in Promoting Responsible Development and Use:

Development and Integration of Artificial Intelligence Technologies for Innovation Acceleration

e-care Living Lab - 5 avenue du Grand Sablon La Tronche - FRANCE Tel: +33 (0)

Stanford Center for AI Safety

AI Frontiers. Dr. Dario Gil Vice President IBM Research

Disclosure: Within the past 12 months, I have had no financial relationships with proprietary entities that produce health care goods and services.

Advancing Health and Prosperity. A Brief to the Advisory Panel on Healthcare Innovation

How Explainability is Driving the Future of Artificial Intelligence. A Kyndi White Paper

Enabling ICT for. development

(Fig.) JPMA Industry Vision 2025

BRINGING DEEP LEARNING TO ENTERPRISE IMAGING CLINICAL PRACTICE

TRUSTING THE MIND OF A MACHINE

Pan-Canadian Trust Framework Overview

g~:~: P Holdren ~\k, rjj/1~

Digital Medical Device Innovation: A Prescription for Business and IT Success

An Essential Health and Biomedical R&D Treaty

Making Precision Medicine A Reality: Molecular Diagnostics, Remote Health Status Monitoring and the Big Data Challenge

Privacy and the EU GDPR US and UK Privacy Professionals

Executive summary. AI is the new electricity. I can hardly imagine an industry which is not going to be transformed by AI.

By Mark Hindsbo Vice President and General Manager, ANSYS

Reduce cost sharing and fees Include other services. Services: which services are covered? Population: who is covered?

HTA Position Paper. The International Network of Agencies for Health Technology Assessment (INAHTA) defines HTA as:

APEC Internet and Digital Economy Roadmap

Intergovernmental Group of Experts on E-Commerce and the Digital Economy First session. 4-6 October 2017 Geneva. Statement by SINGAPORE

Imagine your future lab. Designed using Virtual Reality and Computer Simulation

Written Submission for the Pre-Budget Consultations in Advance of the 2019 Budget By: The Danish Life Sciences Forum

The EFPIA Perspective on the GDPR. Brendan Barnes, EFPIA 2 nd Nordic Real World Data Conference , Helsinki

A Focus on Health Data Infrastructure, Capacity and Application of Outcomes Data

Research Excellence Framework

Application of AI Technology to Industrial Revolution

PREFACE. Introduction

UKRI Artificial Intelligence Centres for Doctoral Training: Priority Area Descriptions

Introduction to Computational Intelligence in Healthcare

Clinical Natural Language Processing: Unlocking Patient Records for Research

Applied Applied Artificial Intelligence - a (short) Silicon Valley appetizer

Latin-American non-state actor dialogue on Article 6 of the Paris Agreement

PHARMACEUTICALS: WHEN AI ADOPTION HAS GATHERED MOST MOMENTUM.

& Medical Tourism. DIHTF - Dubai 20 th -21 st Feb 2018 V S Venkatesh -India

Committee on the Internal Market and Consumer Protection. of the Committee on the Internal Market and Consumer Protection

Medical Technology Association of NZ. Proposed European Union/New Zealand Free Trade Agreement. Submission to Ministry of Foreign Affairs & Trade

EU s Innovative Medical Technology and EMA s Measures

Shared Investment. Shared Success. ReMAP Call for Proposals by Expression of Interest

A Roadmap for Connected & Autonomous Vehicles. David Skipp Ford Motor Company

TechVelopment: Approach and Narrative

The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. FairWare2018, 29 May 2018

National Medical Device Evaluation System: CDRH s Vision, Challenges, and Needs

FDA Centers of Excellence in Regulatory and Information Sciences

Advances and Perspectives in Health Information Standards

PROJECT FACT SHEET GREEK-GERMANY CO-FUNDED PROJECT. project proposal to the funding measure

WHO Regulatory Systems Strengthening Program

#Renew2030. Boulevard A Reyers 80 B1030 Brussels Belgium

Artificial Intelligence in Medicine. The Landscape. The Landscape

MORE POWER TO THE ENERGY AND UTILITIES BUSINESS, FROM AI.

HDR UK & Digital Innovation Hubs Introduction. 22 nd November 2018

Guidelines to Promote National Integrated Circuit Industry Development : Unofficial Translation

White paper The Quality of Design Documents in Denmark

Innovation for Defence Excellence and Security (IDEaS)

WIPO Development Agenda

December Eucomed HTA Position Paper UK support from ABHI

Draft executive summaries to target groups on industrial energy efficiency and material substitution in carbonintensive

National approach to artificial intelligence

IEEE IoT Vertical and Topical Summit - Anchorage September 18th-20th, 2017 Anchorage, Alaska. Call for Participation and Proposals

Innovation Crossover Research Life Sciences/Biomedical Health Informatics. Distribution Statement A: Approved for Public Release

EXPLORATION DEVELOPMENT OPERATION CLOSURE

Digital Identity Innovation Canada s Opportunity to Lead the World. Digital ID and Authentication Council of Canada Pre-Budget Submission

Randomized Evaluations in Practice: Opportunities and Challenges. Kyle Murphy Policy Manager, J-PAL January 30 th, 2017

The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems. Overview April, 2017

Digital Olfaction Society Fourth World Congress December 3-4, 2018 Tokyo Institute of Technology 1

Catapult Network Summary

Health Technology Assessment of Medical Devices in Low and Middle Income countries: challenges and opportunities

TOURISM INSIGHT FRAMEWORK GENERATING KNOWLEDGE TO SUPPORT SUSTAINABLE TOURISM. IMAGE CREDIT: Miles Holden

The Information Commissioner s response to the Draft AI Ethics Guidelines of the High-Level Expert Group on Artificial Intelligence

Framework Programme 7

European Commission. 6 th Framework Programme Anticipating scientific and technological needs NEST. New and Emerging Science and Technology

SMART PLACES WHAT. WHY. HOW.

Why Artificial Intelligence will Revolutionize Healthcare including the Behavioral Health Workforce.

How do you teach AI the value of trust?

Dr George Gillespie. CEO HORIBA MIRA Ltd. Sponsors

GUIDELINES SOCIAL SCIENCES AND HUMANITIES RESEARCH MATTERS. ON HOW TO SUCCESSFULLY DESIGN, AND IMPLEMENT, MISSION-ORIENTED RESEARCH PROGRAMMES

Enabling a Smarter World. Dr. Joao Schwarz da Silva DG INFSO European Commission

Efficient Deep Learning in Communications

The Alan Turing Institute, British Library, 96 Euston Rd, London, NW1 2DB, United Kingdom; 3

IGF Policy Options for Connecting the Next Billion - A Synthesis -

COMMISSION RECOMMENDATION. of on access to and preservation of scientific information. {SWD(2012) 221 final} {SWD(2012) 222 final}

Decision Support System EBMeDS. Timo Haikonen

Health Innovation Manchester

Issues in Emerging Health Technologies Bulletin Process

Violent Intent Modeling System

Australian Institute for Machine Learning: Catching the wave of the next industrial revolution

Implementation of Systems Medicine across Europe

Twenty-Thirty Health care Scenarios - exploring potential changes in health care in England over the next 20 years

FELLOWSHIP SUMMARY PAPER. Digital Inclusion in New Zealand A CALL TO ACTION

IMHA Research. In short it is addressing two questions:

Key points for a Federal Government Strategy on Artificial Intelligence

The Royal College of Radiologists Response to: House of Lords Select Committee on Artificial Intelligence 6 September 2017

ASEAN: A Growth Centre in the Global Economy

Why Foresight: Staying Alert to Future Opportunities MARSHA RHEA, CAE, PRESIDENT, SIGNATURE I, LLC

Transcription:

Focus Group on Artificial Intelligence for Health Marcel Salathé (EPFL, Switzerland), Thomas Wiegand (Fraunhofer HHI, Germany), Markus Wenzel (Fraunhofer HHI, Germany) and Ramesh Kishnamurthy (WHO) E-mail: tsbfgai4h@itu.int Web: https://www.itu.int/go/fgai4h Abstract Artificial Intelligence (AI) the phenomenon of machines being able to solve problems that require human intelligence has in the past decade seen an enormous rise of interest due to significant advances in effectiveness and use. The health sector, one of the most important sectors for societies and economies worldwide, is particularly interesting for AI applications, given the ongoing digitalisation of all types of health data and health information. The potential for AI assistance in the health domain and advancing the field of digital health is immense, because AI can support medical and public health decision making at reduced costs, everywhere. However, due to the complexity of AI algorithms, it is difficult to distinguish good from bad AI-based solutions and to understand their strengths and weaknesses, which is crucial for clarifying responsibilities and for building trust. For this reason, the International Telecommunication Union (ITU) has established a new Focus Group on "Artificial Intelligence for Health" (FG-AI4H) in partnership with the World Health Organization (WHO). The governance and delivery of health and care services are usually the responsibility of a government - even when provided through private providers and health insurance systems - and thus under the responsibility of WHO/ITU member states. FG-AI4H will identify opportunities for international standardization of AI for Health-relevant data, information, algorithms, and processes, which will foster the application of AI to health issues on a global scale. In particular, it will establish a standardized assessment framework with open benchmarks for the evaluation of AI-based methods for health, such as AI-based diagnosis, triage or treatment decisions. 1 Introduction "The enjoyment of the highest attainable standard of health" is a basic human right (WHO Constitution, 1946). WHO s high priority is universal health coverage: ensuring that all people can access the health services they need, without facing financial hardship. To keep itself accountable, WHO has set 3 strategic targets: 1 billion more people benefitting from universal health coverage; 1 billion more people better protected from health emergencies; and 1 billion more people enjoying better health and well-being. AI, combined with other digital technologies will be vital tools in achieving all these three of these targets. In recognition of the growing importance of digital health technologies, including AI, the WHO Member States unanimously adopted the resolution on Digital Health during their Seventy-first World Health Assembly on 26 May 2018 in Geneva, Switzerland. Good health for everyone has for centuries been a key goal of most governments, and public health breakthroughs such as vaccination are generally credited with having saved and continuing to save billions of lives. In many countries today, the healthcare industry is the largest and/or fastest growing industry, increasingly often accounting for more than ten percent of the gross domestic product. It is thus not surprising that the healthcare sector is a key area of application, when a technology reaches new levels of performance, as is the case with modern AI. Given the size of the health sector, the potential economic opportunities are immense. The potential for leveraging new technology for the common good by improving public health can also be enormous. It is therefore prudent to look at the potential of AI in helping solve health-related issues. This short paper describes https://itu.int/en/itu-t/focusgroups/ai4h/documents/fg-ai4h_whitepaper.pdf Ref: FG-AI4H-A-006

- 2 - the current applications of AI in the health domain and discusses challenges and how to address these in order to unlock the full potential of the technology. 2 Artificial intelligence The term artificial intelligence is not new. As an academic field, it dates back to at least the mid-20 th century, and has since gone through multiple cycles of substantial progress, followed by inflated expectation, and then disappointment. A combination of new machine learning algorithms, increased computational power, and an explosion in the availability of very large data sets ( big data ), as a consequence of the digitalisation of health information, has led to recent stunning advances, with demonstrations of machines achieving human-level competence at solving clearly defined tasks across many domains. The current cycle is primarily driven by the extremely impressive progress recently made by deep learning, a branch of machine learning that very effectively uses artificial neural networks to address harder problems than ever before. Applications of deep learning have achieved human or superhuman performance in many fields such as image recognition and natural language processing. Importantly, the neural network parameters are tuned in an automated process of iterative training. In many cases, no expert-level knowledge is used in the training process, other than direct input and output parameters (e.g. sets of pixels and their associated labels), giving rise to the so-called end-to-end learning. In other words, the networks learn to go directly from one end the input to the other end the output without requiring any domain-specific expertise in between. The resulting network structures are generally very large, with oftentimes billions of parameters, and of such complexity that it is impossible to describe in simple terms how they work, which has led to new challenges concerning their explainability and interpretability. 3 AI for health The recent digitalisation of all types of health data and information and the fact that computers are increasingly able to interpret images and text as accurately as humans 1,2 opens up countless avenues for AI applications in health. Much of the recent work on AI in health has thus gone into applications that revolve around image interpretation and natural language understanding. In the field of medical image analysis, one of the most publicized studies was by Esteva et al. 3, demonstrating the accurate classification of skin lesions using a deep neural network that was trained on clinical images, and assessing its performance by comparing its classifications to those made by board-certified dermatologists, revealing the network had reached human accuracy levels. A survey 4 published in 2017 reviewed over 300 papers using deep learning in medical image analysis, typically for detection, segmentation, or classification tasks. The reviewed papers covered the analysis of X-ray, CT, MRI, digital pathology, cardiac, abdominal, musculoskeletal, foetal, dermatological and retinal images. In language understanding, the areas of biomedical text mining, electronic health record analysis, sentiment analysis on internet-derived data, and medical decision support systems have shown promising results 5. Furthermore, AI methods can automatically interpret laboratory results (ranging from standard blood testing to recent advances in high-throughput genomics and proteomics) and time series (e.g. electrocardiogram, temperature, oxygen saturation, blood pressure). A large part of the world s population has access to devices that can utilize compute-intense AIpowered applications, considering the ubiquity of computers and smartphones connected via the internet to powerful computing clusters. For example, relatively accurate detection of skin lesions using a state-of-the art camera-equipped mobile phone is technologically feasible, and medical chatbots are already on the market that can answer basic medical questions. Given the speed at which AI-based algorithms can be developed, improved, and deployed, the technology has the potential for first-class medical decision making that is accessible worldwide and affordable to the entire global population 6.

- 3 - While this progress is exciting, the potential of AI for health also faces a number of challenges. In particular, deep learning models are famously hard to interpret and explain - which may substantially hinder their acceptance when facing critical or even vital decisions. Thus, interpretability, explainability, and proven robustness (e.g. to outliers and to adversarial attacks) are crucial aspects that have to be considered for trustworthiness. Moreover, health data are sensitive and subject to privacy laws. Therefore, access to sufficient training data is a major limiting factor for the predictive performance of models on data previously unseen. This problem is complicated further because most modern AI applications are based on supervised learning and rely on data that are labelled. In the health domain, labels can typically be given only by qualified specialists, in contrast, e.g., to simple object recognition, where photographs can be labelled by legions of laypersons. In addition, machine learning approaches must take into account the biases 7 that both text and image-based medical data most likely contain. In machine learning, algorithms and training data have to be considered in combination. The algorithms cannot extrapolate, but can only learn patterns that are present in the training data, which need to be of high quality, sufficiently large to learn the myriad of parameters of the data-hungry algorithms, and theoretically should cover all possible instances including outliers. 4 Focus Group on "Artificial Intelligence for Health" (FG-AI4H The Focus Group on "Artificial Intelligence for Health" (FG-AI4H) established by ITU in partnership with WHO aims to meet the challenges by identifying opportunities for international standardization. The FG-AI4H works on the premise that the broad adoption of AI in health would benefit from a standardized and transparent evaluation of the AI methods. It should be noted that the Focus Group neither intends to specify the AI for health algorithms themselves as an ITU Recommendation, nor to standardize medical data formats, nor to establish performance criteria of hardware running the AI algorithms. The Focus Group on Artificial Intelligence for Health will work towards a standardized assessment of AI-based solutions for health, which will assure its quality, foster the adoption in practice and have a strong positive impact on global health. AI can support medical diagnostics in clinical settings and public health decision-making by mapping from input data to output variables. Exemplary input data are images, text, time series, SNOMED CT, or HPO codes. Exemplary output variables are ICD or ICHI codes, triage tags, or other labels, depending on the use case. For instance, an AI-algorithm could map the input data to a diagnosis (represented by an ICD code), or it could make a treatment decision (represented by an ICHI code) via a diagnosis: Figure 1: Mapping input data to diagnosis and treatment decision Hence, benchmarking could be conducted without the need to disclose or standardize the AIalgorithms themselves. Instead, standardized input data sets could be created with corresponding confirmed standardized diagnosis or decision codes or other variables per patient. The data could be split into public training and undisclosed test sets. Performance metrics for comparisons could be created, which would reflect the quality of the mapping (accuracy, reproducibility, robustness, absence of bias, explainability, interpretability etc.) as well as timing aspects and other costs. For the establishment of a benchmarking framework, it is necessary to first identify potential health problems to which AI interventions can be applied and assessed. The targeted problems and possible solutions should be scalable. Structured medical data need to be collected and open benchmarks have

- 4 - to be developed for the identified use cases and solutions. Subsequently the benchmarking system itself has to be implemented. 5 Benchmarking pipeline Here, we outline a proposal for a benchmarking pipeline that will be applicable to many different scenarios. At the core of the evaluation framework contains a undisclosed test set on which models will be evaluated. The pipeline is summarized in Figure 2. NOTE: Private data: One's own data for training purposes; Undisclosed data: test data not available to algorithm developers. Figure 2: A benchmarking pipeline The benchmarking pipeline consists of the following steps: 1) FG-AI4H enables creation of public data repositories wherever possible Most modern approaches to building AI models involve training on existing data sets. FG- AI4H will work to enable the creation of publicly available high-quality (accurate, reliable, verifiable) data sets to foster the creation of a diverse ecosystem of actors who want to participate in the benchmarking process. 2) Participants build AI models based on public data and other (undisclosed) data sources Participants will train their models based on a clear problem definition which is crucial for the success of a benchmark. This needs to include the quantitative measure according to which the benchmark will be assessed. 3) Models are submitted to a benchmarking platform like crowdai, which checks the eligibility of the model Models will be submitted to agreed upon benchmarking platforms (such as www.crowdai.org). The eligibility of the models must be defined on a case-by-case basis, but should include minimum requirements such as a maximum run time, and a maximum memory requirement.

- 5-4) Eligible models are executed and evaluated on undisclosed test data, managed by FG- AI4H The benchmarking platform executes the model on the undisclosed test set. The creation and governance of this undisclosed test set will be managed by subgroups of FG-AI4H. This undisclosed test set represents the gold standard data set for the benchmark. 5) Participant receives evaluation, model has been benchmarked After having executed the model on the undisclosed data set, the benchmarking platform returns the evaluation results to the participant, allowing for further developments to improve the model. 6) Central leaderboard allows comparison of model performances The Benchmarking platform allows for the comparison of the models performance on a central leaderboard, or using a pass/fail scoring. The relevant subgroups of FG-AI4H can define baselines against which models can be assessed. It should be noted that this pipeline is meant to provide a broad overview only - developing the details, and communicating and documenting those clearly, is one of the key achievable goals of FG-AI4H and its working groups. A detailed methodology used in the aforementioned pipeline will be published subsequently published as a separate reference document. 6 Conclusion Artificial Intelligence for Health (AI4H) offers new ways to address the shortage of medical professionals, which becomes more serious due to demographic changes and population growth. The technology has the potential to significantly improve and support medical diagnostics and treatment decision processes based on digital data. However, AI4H is rarely deployed in practice at a global scale so far due to legal, business, technical, or other constraints. The Focus Group on Artificial Intelligence for Health will work towards a standardized assessment of AI-based solutions for health, which will assure its quality, foster the adoption in practice and have a strong positive impact on global health. The FG-AI4H was founded in July 2018 by the International Telecommunication Union (ITU) in partnership with the World Health Organization (WHO), the United Nations specialized agencies for ICTs and Health, respectively. Participation in FG-AI4H is open to all researchers, engineers, practitioners, entrepreneurs and policy makers. FG-AI4H will identify common health-specific domains (e.g. general diagnosis, specialty diagnosis [e.g. dermatology], health natural language processing, general clinical encounter note data extraction and coding, Rx coding, lab coding, etc.) and for each domain it will work for the sourcing of test data, select current gold standard test success rates (e.g. how does a professional score on this test data), set benchmark rates for AI system (to be acceptable for decision support, to be acceptable for autonomous operation), and define acceptable fail modes (e.g. alert human operator if below a given confidence threshold). In addition to medical, scientific and technical aspects, policy, regulatory, cultural, business and other practical aspects must be considered in relationship to standardization efforts. Therefore, the FG- AI4H will establish liaisons at an early stage with selected standards bodies, forums, consortia, regulators, health professionals, core research as well as patient organizations, engineering teams, entrepreneurs and policy makers. Registries for reporting serious adverse events and guidance documents for national administrations will support the safe and appropriate use of AI in healthcare.

- 6 - References 1. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. 2015. He K, Zhang X, Ren S, Sun J. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 1026 1034 (doi:10.1109/iccv.2015.123) 2. Google's neural machine translation system: Bridging the gap between human and machine translation. 2016. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, et al. arxiv:1609.08144. 3. Dermatologist-level classification of skin cancer with deep neural networks. 2017. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Nature 542, 115 118. (doi:10.1038/nature21056) 4. A survey on deep learning in medical image analysis. 2017. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, et al. Medical Image Analysis, 42:60-88 5. Opportunities and obstacles for deep learning in biology and medicine. 2018. Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, et al. J. R. Soc. Interface; (doi: 10.1098/rsif.2017.0387) 6. Dynamic Clinical Algorithms: Digital Technology Can Transform Health Care Decision- Making. 2018. Bell D, Gachuhi N, Assefi N. Am. J. Trop. Med. Hyg. 98(1), 9 14 (doi:10.4269/ajtmh. 17-0477) 7. Semantics derived automatically from language corpora contain human-like biases. 2017. Caliskan A, Bryson JJ, Narayanan A. Science 6334:183-186: 183-186