Enjeux scientifiques de la médecine numérique : capteurs, données et algorithmes Nicolas Vayatis (ENS Cachan)
A time for mathematics and computer science: Why now?
The future of healthcare will be Digital Translational Technologydriven Distributed Quantified Optimized Customized Assessed in real-time Driven through well-being indicators
The future of healthcare will be Digital Translational Technologydriven Distributed Quantified Optimized Customized It s all about sensors, data and machine learning Assessed in real-time Driven through well-being indicators
Ingredients and perspectives for the future of healthcare Monitoring Follow-up Sensors Databases Machine learning algorithms Interfaces for man-data interactions Recommendations Training through simulation Scientific discovery
Working assumptions Human phenotype in situ is terra incognita Human behavior presents high diversity across individuals but individual behavior is stable in time Woman and man are machines like the others
Outline of the talk 1. Clinicians and mathematicians: the SmartCheck project 2. The power and the limitations of machine learning techniques 3. The key steps for the future of healthcare
1. Clinicians and mathematicians
SmartCheck project: digitalize neurological tests Ongoing clinical trials at HIA Val-de-Grâce
Pilot project: digitalize neurological tests Ongoing clinical trials at Val-de-Grâce hospital Operational protocol compatible with the ergonomic constraints of the consultation
Objectives of our research To characterize phenotype from low-level sensor measurements Descriptive Quantification for instant and longitudinal assessment of the patient Positioning the patient in the cohort (norms) Predictive Evolution of a pathology Risk indexes, e.g. Risk of a fall within six months Prescription of therapeutic routes Assessment, scientific discovery and knowledge management Recommendations of appropriate therapy Explore and compare patterns over a large database
Objectives of our research To characterize phenotype from low-level sensor measurements Descriptive Quantification for instant and longitudinal assessment of the patient Positioning the patient in the cohort (norms) Predictive Taking advantage of cohort size and Evolution of a pathology Risk indexes, e.g. Risk of a fall within six months Prescription of therapeutic routes knowledge of clinical context Assessment, scientific discovery and knowledge management Recommendations of appropriate therapy Explore and compare patterns over a large database
Posturography data Romberg test and statokinesigrams Romberg test Comparison of two curves: eyes open/eyes close Hardly reproducible Clinical research context Only then curves are recorded and analyzed Cohorts of the order of 100 subjects Summary of each curve with a few features Statokinesigram = Trajectory of Center of Pressure over 30s Sway area, Sway path, AP/ML amplitudes + frequency bands cf. Baratto et al. (Motor Control, 2002)
Posturography data Romberg test and statokinesigrams Statokinesigram = Trajectory of Center of Pressure over 30s Romberg test Small in-lab samples Comparison of two curves: eyes open/eyes close Hardly reproducible of complex functional Clinical research context data with strong expert Only then curves are recorded and analyzed Cohorts of the order of 100 subjects priors about relevant Summary of each curve with a few features summaries Sway area, Sway path, AP/ML amplitudes + frequency bands cf. Baratto et al. (Motor Control, 2002)
Posturography sensors WiiBB vs. ATMI Cheap Hackable Very expensive Opaque signal preprocessing
Posturography sensors WiiBB vs. ATMI Cheap Hackable Muscular frequency during postular stabilization about Very expensive 12Hz and sway path of Opaque signal preprocessing many cms
Scientific and technical challenges Main goals for clinical research: Re-invent the clinical assessment after the Romberg test Identify low-level behavioral correlates of classical nosology Infer the origin of trouble (motor, muscular, vestibular, sensorial, ) Assess the risk of future fall Main modeling and algorithmic issues: Perform automatic feature selection in high dimensional setup Calibrate high dimensional predictive models Recover interpretable dimensions Main technical steps: Collect data massively (thousands of statokinesigrams) standardize protocols Secure clean signals Extract raw data and solve the random sampling issue Digitalize clinical feedback through expert annotation need software interface
Scientific and technical challenges Main goals for clinical research: Re-invent the clinical assessment after the Romberg test Identify low-level behavioral correlates of classical nosology Infer the origin of trouble (motor, muscular, vestibular, sensorial, ) Assess the risk of future fall Main modeling and algorithmic issues: Perform automatic feature selection in high dimensional setup Calibrate high dimensional predictive models Recover interpretable dimensions Main technical steps: Requires team spirit, funding (engineering and IT support) to open a new field of scientific investigations Collect data massively (thousands of statokinesigrams) standardize protocols Secure clean signals Extract raw data and solve the random sampling issue Digitalize clinical feedback through expert annotation need software interface
Scaling up Implementing the double-loop Acquisition Transmission Protocol HMI Network Expertise Processing
SmartCheck team: a blend of expertise Coordinators Clinicians Mathematicians IT Pierre-Paul Vidal (CNRS) Damien Ricard (SSA) Julien Audiffren (ENS Cachan) Nikos Promponas (ENS Cachan) Nicolas Vayatis (ENS Cachan) Catherine de Waele (Pitié) Laurent Oudre (Paris 13) Robert Marino (IDF INNOV) Alain Yelnik (F. Widal)
Ongoing joint projects Cognac G - CMLA Dynamic posture of neurological patients analysis of walk signals (accelerometric and piezoelectric sensors) Pathological ocular movements of babies eye tracking signals Emotion and sensori-motor loops during training through HMI multiple sensor signals Stress and body-machine interaction in the cockpit multiple sensor signals Overall activity of patients in recovery in their habitat multiple sensor signals (motion, gestures, eye tracking, ) And more
2. The power and the limitations of machine learning techniques
Machine learning is in your hand! User preferences Language translation Display News Ads Songs Anti-spam Gaming
Embedded Machine Learning in domestic uses Search engines Recommender systems ZIP code recognition Fraud detection
Machine learning Makes it smart Emulates human operator Organizes massive data sets Leads to reliable predictions
Complexity of a learning task Handwritten digit recognition vs. Cat detection Digits Cats
Complexity of a learning task Handwritten digit recognition vs. Cat detection Digits MNIST data set page by: Yann LeCun (NYU and Facebook AI Research) Corinna Cortes (Google Labs) Chris Burges (Microsoft Research) Cats Check work by Fleuret and Geman (JMLR, 2008)
Complexity of a learning task Handwritten digit recognition vs. Cat detection Does machine learning match human performance? Digits Cats
Complexity of a learning task Handwritten digit recognition vs. Cat detection Does machine learning match human performance? YES Digits Cats NO
Complexity of a learning task Handwritten digit recognition vs. Cat detection It s all about context! Easy Difficult
Machine learning applications for experts Knowledge management Scientific discovery Aid to monitoring and decision making
Application example: anomaly detection on aircraft engines
Application example: a dedicated search engine for vibration engineers Query Rank 1 Rank 3 Rank 2 Rank 8
Application example: summary A dedicated search engine Ingredients Benefits A query is a data view (e.g. a portion of a signal) in a given context (e.g. acceleration) Results are a sorted list of engines which have presented similar characteristics in the past (similar data views) Human expertise for defining the query and analyzing the result Low-level data processing for the reduction of signal to a few meaningful characteristics and definition of a distance between signals High-level processing: production of ordered list and categories of engines Computation of an anomaly score Fast exploration of the database Replicable Tool for training junior engineers
3. The keys steps for the future of healthcare
The material data and their context Big enough Contextualized Data Normalized Annotated by experts
The material data and their context Big enough ContextualizedMy experience: Data Normalized mine your own data Annotated by experts
The tools Science for back-up? October 2013 1.4 million papers published yearly in scholarly journal with main quality criteria: novelty, technicality, improvement on past results Models which can be tuned in many different ways give researchers more scope to perceive a pattern where none exists. According to some estimates, threequarters of published scientific papers in the field of machine learning are bunk because of this overfitting. negative results in 14% of published papers, down from 30% in 1990 induces biases in the assessment of truth with an increase of the relative rate of false positives.
The tools Science for back-up? October 2013 1.4 million papers published yearly in scholarly journal with main quality criteria: novelty, technicality, improvement on past results Models which can be tuned in many different ways Science, give researchers more scope to perceive a pattern where none exists. as we According know to some estimates, threequarters of published scientific papers in the field of machine it, learning is over are bunk because of this overfitting. negative results in 14% of published papers, down from 30% in 1990 induces biases in the assessment of truth with an increase of the relative rate of false positives.
That kind of science: robust algorithms founding open and reproducible research Robust Simple Algorithms Customized <http://www.ipol.im> A pioneer initiative for online numerical experimentation in the field of image processing Open and validated
That kind of science: robust algorithms founding open and reproducible research Robust Simple Algorithms Open and validated Customized <http://www.ipol.im> Extension to online processing A pioneer initiative for online numerical experimentation in the field of image processing physiological signals
Also requires Collaborative spirit for interdisciplinary actions A new type of clinicians and scientists, and novel practices for valorization of research A new type of entrepreneurs and products Less fragmentation in medical devices and softwares A new type of synergies between private and public sector to allow data valorization while preserving privacy and protecting individual freedoms
Drivers for the future of healthcare food for thought Today Tomorrow? Profitable Privacy-preservation Employment Waste of public expenses Efficient Ethics Dignity Rationalization of public expenses
The future of (clinical) research the future of society Today Tomorrow?
Thank you! Nicolas Vayatis vayatis@cmla.ens-cachan.fr