ARGUING THE SAFETY OF MACHINE LEARNING FOR HIGHLY AUTOMATED DRIVING USING ASSURANCE CASES LYDIA GAUERHOF BOSCH CORPORATE RESEARCH

ARGUING THE SAFETY OF MACHINE LEARNING FOR HIGHLY AUTOMATED DRIVING USING ASSURANCE CASES 14.12.2017 LYDIA GAUERHOF BOSCH CORPORATE RESEARCH

Arguing Safety of Machine Learning for Highly Automated Driving Agenda Goals and Motivation Application Context and Systems Engineering Assurance Cases for Machine Learning Outlook 2 CR/AEX4 14/12/2017

Goals and Motivation Highly Automated Driving From hands-on to hands-off automated driving Increasing level of automation from assistance functions to fully autonomous Systems will operate in a crowded, uncontrolled environment Move from fail-silent to fail-operational systems requires change in technical approach to achieving safety Shift in perceived risk even accidents that would happen more often with human drivers may be considered unacceptable in the future 3

Goals and Motivation What is Machine Learning? Machine Learning The ability to learn without being explicitly programmed Example: Convolutional Neural Networks Features are learned by presenting the network with data and ground truth, and adjusting weightings in the network Hidden layers distinguish hierarchical features in the inputs The output layer presents the probability of an input belonging to a particular class Trained until an approximation of the target function is reached Deep Neural Networks are huge (more than 300 Mio. parameters in millions of units) Testing on unit level is like testing programs on a transistor level Source: www.cityscapes-dataset.com 4

Goals and Motivation Machine Learning for Autonomous Systems Chances: By making sense of unstructured data, machine learning is particularly suited to open context systems such as HAD Has the potential to exceed the performance of human drivers Can enable automated driving functions that enhance safety, e.g. person, lane, vehicle detection systems, trajectory planning Risks: Unlike standard software algorithms contain inherent uncertainties in their results ( I m 80% sure that s a car ) They can also be unpredictable learning the wrong features No development standards or best practices exist (yet!) for determining whether deep-learning-based methods are safe Source: www.cityscapes-dataset.com What methods can be deployed to ensure and argue that machine learning functions meet their performance requirements? 5

Goals and Motivation Assurance (Safety) Cases Assurance Case (ISO/IEC 15026): Reasoned, auditable artefact created that supports the contention that its top level claim (or set of claims), is satisfied, including systematic argumentation and its underlying evidence and explicit assumptions that support the claim(s) System release procedures must ensure that sufficient evidence is systematically captured during development and validation to argue a tolerable residual risk Goal Structuring Notation (GSN): Graphical notation for structuring an assurance case, linking argumentation rationale, context, assumptions and evidence Context <Context Identifier> <Reference to contextual information or statement> Strategy <Goal Identifier> <If all sub goals are true then is sufficient to establish the claim that higher level goal is true> <Solution Identifier> <Reference to an evidence item or items> Goal <Goal Identifier> <Presents a claim forming part of the argument> <Strategy identifier> <Describes the nature of inference between a goal and ist supporting goals> Evidence <Goal Identifier> <Undeveloped sub goal> Assumption <Assumption Identifier> <Intentionally unsubstantiated statement> <Justification Identifier> <Statement of rationale> Justification A Sub-goal J 6

Assumptions Guarantees Application Context and Systems Engineering Managing Complexity Abstract, divide and conquer Identification of critical equivalent classes in input data Consideration of well known relations and physical effects Specification of required functional, performance and safety properties Validation through driving tests alone, would require millions of test kilometres to provide a statistical argument for safety! Highly Automated Driving Function function Open Context Scenario-based validation, Driving Tests, Field Data Evaluation Statistical Evaluation of results (from simulation, HIL, prototype) 7

Application Context and Systems Engineering Systems Engineering and Machine Learning Demonstrating the safety of machine learning techniques requires: An understanding of their context within the wider system, A precise definition of the expected behaviour, including non-functional constraints Explicitly stating assumptions regarding the system context and environment in which they will be used An understanding of the impact of failures and insufficiencies including the consideration of additional mitigation measures 8

Application Context and Systems Engineering Example: System Context for Machine Learning Example Requirement on CNN for object detection : Locate objects of class person from a distance of 100m, with a lateral accuracy of +/-20cm, a false negative rate of 1% and false positive rate of 5%. Example Assumptions: Braking distance and speed are sufficient to react when detecting persons for example 100m ahead of the planned trajectory of the vehicle. Alternative sensing methods can be used in order to reduce the overall false negative and false positive rates of the system to an acceptable level. Example context information: Distance and accuracy must be mapped to dimensions in the image frames presented to the CNN (i.e. size of objects in pixels, etc.) 9

Assumptions Guarantees Application Context and Systems Engineering Framing the context of the Assurance Case for ML A contract-based design approach to systems engineering is useful to frame the context within which the safety of the machine learning function can be argued: A1 Assumptions on the operational profile of the system. A A2 G1 C1 Assumptions on the inputs to the machine learning function. A The residual risk associated with functional insufficiences in the object detection and classification function is acceptable. Definition of functional and performance requirements on the object classification function. Argue that the function meets its safety guarantees under all conditions where the assumptions hold A3 Assumptions on the performance potential of machine learning. A S1 Argument over causes of functional insufficiencies in machine learning. C2 Causes of functional insufficiencies in machine learning 10

Causes of Functional Insufficiencies Assurance Case Structure A1 Assumptions on the operational profile of the system. A A2 G1 C1 Assumptions on the inputs to the machine learning function. A The residual risk associated with functional insufficiences in the object detection and classification function is acceptable. Definition of functional and performance requirements on the object classification function. A3 Assumptions on the performance potential of machine learning. A Argue that causes of insufficiencies are adequately addressed S1 Argument over causes of functional insufficiencies in machine learning. C2 Causes of functional insufficiencies in machine learning G2 G3 G4 G5 G6 The operating context is well defined and reflected in training data. The function is robust against distributional shift in the environment. The function exhibits a uniform behaviour over critical classes of situations. The function is robust against differences between its training and execution platforms. The function is robust against changes in its system context. 11

Causes of Functional Insufficiencies Operating Context and Training Data Problem: Function not trained for the target scope leading to inadequate performance Potential Causes: Poor specification of operating context (e.g. regional specifics) Implicit assumptions on environment (e.g. behaviour/dimensions of pedestrians) Emergent properties from interactions between ML function and environment Too little (under-fitting)/ too much (over-fitting) training data Potential sources of supporting evidence: Structured specification of operating context, that is continuously adapted based on experiences during validation and in the field Field observations to confirm target environment matches specification On-line monitoring of environment against target profile 12

Arguing Safety of Machine Learning for Highly Automated Driving Additional Sources of Evidence Research Topics Adversarial perturbations Saliency AI-generated synthetic data input prediction car training truck Input data generation f -1 Scene description BMW at location x with orientation y, person at location Perturbation added Create cases, where the perception doesn t work Argue why they are not relevant for our application, e.g., they do not occur in the real world, or improve perception accordingly prediction: truck saliency: what does network use to make the decision? test Saliency rudimentarily reconstructs why a perception output was produced Allows debugging, leads to increased understanding and trust in the approach input style input output AI can be used to create synthetic data, e.g., via style transfer Could be used to create additional training and test data. For example, we could transfer images to rainy weather conditions 14

Arguing Safety of Machine Learning for Highly Automated Driving Challenges The use of machine learning algorithms for HAD introduces significant challenges especially in arguing the residual risk associated with functional insufficiencies of the system. Against which quality criteria should ML-functions be measured? What combination of measures provide a convincing argument for safety? How can the necessary evidence be generated in an efficient manner? What other system measures (redundancy, plausibility checks, etc.) are needed? 15

Arguing Safety of Machine Learning for Highly Automated Driving Outlook Next Steps: Considerable research is required to understand the causes of functional insufficiencies of ML and identify suitable countermeasures and validation approaches First steps: arguing adequacy of simple perception functions within an overall system context Industry consensus will be required to agree on appropriate quality criteria and measures which could form the basis for future standards. 16

Thank you! Any Questions? 17