FAIL OPERATIONAL E/E SYSTEM CONCEPT FOR FUTURE APPLICATION IN ADAS AND AUTONOMOUS DRIVING Fail Safe Fail Operational Fault Tolerance ISO 26262 Hermann Kränzle, TÜV NORD Systems
OUR FUNCTIONAL SAFETY CERTIFIED PROGRAM FOR PRODUCT PROCESS MANAGEMENT PERSONNEL FOR VEHICLES INDUSTRY INDUSTRIAL INTERNET IT 2 TÜV NORD Systems
FAIL OPERATIONAL VS. FAIL SAFE Theory Fail-operational systems continue to operate when one of their control systems fail. Fail-safe systems become safe when they cannot operate. Fault-tolerant systems avoid service failure when faults are introduced to the system. Fail-secure systems maintain maximum security when they can not operate. in context of the ISO 26262 (ISO/DIS:2016) 1 Scope :. ISO 26262 addresses possible hazards caused by malfunctioning behavior of safety-related E/E systems, including interaction of these systems. 3.64 functional safety absence of unreasonable risk (3.175) due to hazards (3.72) caused by malfunctioning behavior (3.87) of E/E systems (3.37) Mentioned in 3.40 emergency operation, 3.130 safe state and in Part 11 3
Degree of automation FAIL OPERATIONAL VS. FAIL SAFE 0 Driver only 1 2 ADAS 3 4 5 ADS 4
MOST SYSTEMS ARE FAIL SAFE Fault occurs Fault detection Fault tolerance time time t Possible hazard. Normal Op. Diagnostic Test Interval Fault Reaction Time Safe state Deactivation or degrade the function leads to a Safe State Including the warning concept Examples Deactivation: Adaptive Cruse control, power train, battery charging Examples Degradation: EP-Steering, Braking ( so far ) 5
EFFECT OF FAIL SOS-GOALS Influence of system attributed like availability, reliability, safety and security interference and dependence of safety, fail operational and security requirements (or goals) Non transparency ( state, interconnection and behavior of the system ) Sensitivity interference of results in case of unexpected input change Instability smallest disturbance cause unknown, unwanted behavior of the system Internal dynamics continuous change of the system s state by the system itself without any external influence 6
HOW? WHAT ARE THE CHALLENGES? To understand possible system design we have to make a step back to the item definition, HARA and the functional safety concept Item definition is not isolated to the E/E of the vehicles. -> Vehicle System Assumptions to be present (beyond ISO 26262) Presence of the Driver Safe Place Safe State Scenario Degradation Categories (i.e. automated lane change is not allowed any more) Harmonization of the behavior of all ADS (like in airborne application) -> PAS 7
EFFECT OF FAIL SOS-GOALS Sensor information (nearly raw) provided by the infrastructure other vehicles vehicle itself Pre processed status and emergency information (X-to- Car) Static and dynamic databases: Static maps Processed and consolidated dynamic information (i.e. traffic information system) Abs. positioning information Satellite and terestric (GPS,WIFI,5G, ) Static ( infrastructure) Environment (. 8
Car Sensing Firewall INFLUENCE TO THE SYSTEM DESIGN [ECUs] Firewall Steering[ECUs] ADS [I_ECU] ADS [M_ECU] Radar [ECUs] VMS [ECUs] PTrain [ECUs] 9
INFLUENCE TO THE SYSTEM DESIGN Vehicle Sensors Position Static Informati on (DB) Steering, PT Radar GPS WIFI,5G MAPs Systematic Aspect Full performance or estimate degradation category depending on the missing or incorrect input information Systematic Aspect Hardware Design Aspect Full performance or estimate degradation category depending performing of the electronic subsystems or components Car-to Car Position Sensor Status, Emergency 10
INFLUENCE TO THE SYSTEM DESIGN Full performance or estimate degradation category depending on the missing or incorrect input information Full performance or estimate degradation category depending performing of the electronic subsystems or components 11
SYSTEM DESIGN (FAIL SAFE VS. FAIL OPERATIONAL) S A SG 1 Motor not operating is safe! 0 1 unsafe state! A S 1 S S S A FOG 0 0 1 Motor operating is safe! A 0 12
SYSTEM DESIGN (FAIL SAFE VS. FAIL OPERATIONAL) S S1 Subsystem 1 S IN CASE OF ERROR -> RECONFIGURATION O R A 2 out of 2 [2oo2] S S2 Subsystem 2 1 out of 2 [1oo2] S S S S1 S2 Subsystem 1 (with supervision) IN CASE OF ERROR -> DEACTIVATION Subsystem 2 (with supervision) A N D A 13
OFF_DIAG EN_L_2 SYSTEM DESIGN (FAIL SAFE VS. FAIL OPERATIONAL) CAN µc 1 CAN PWM reset T3 INT ASIC DSP WD PWM DSP PWM CAN S2 CAN S1 CAN WD reset µc 2 EN_L_1 t1 t0 L DIC T2 T1 S2 S1 ST1a ST2a ST1b ST2b O R O R EN_L_1 EN_L_2 L 1 L 2 DIC T2 T1 O R Unsafe state A the only safe state A 14
SOME WORDING SIMPLIFIED FOR THE UNDERSTANDING - SPFM [Single Point Fault Metric] the safe portion of the <first fault> (similar idea: IEC 61508 -> SFF [Safe Failure Fraction] LFM [Latent Fault Metric] the safe portion of the <latent/multi-point fault> multiple-point fault (1.77)[3.96] individual fault that, in combination with other independent faults, leads to a multiple-point failure PMHF[Probability Metric Hardware Fault] - failure rate of the underlying safety goal (dangerous failure) Similar idea: IEC 61508:PFH or in railway: THR 15
IN THE CASE OF BOTH.???? The system architecture is a projection of the metrics and the techniques of separating the lambda values. The Safety Mechanism has a different goal detecting fault and react Normally fail safe and fail operational goals are in contradiction More components less availability/reliability For fail operational we need full or nearly full functionality in the case of a fault. SPFM could be manageable but what about LFM? PMHF 2oo2 vs. 1oo2 System What would be the starting point 16
POSSIBILITIES IF WE TALK ABOUT ASIL C/D WE NEED A STARTING POINT Diagnostics (SMR) S1 a S1 b S2 a S2 b SubSubsystem 1a IN CASE OF ERROR -> DEACTIVATION SubSubsystem 1b IN CASE OF ERROR -> RECONFIGURATION SubSubsystem 2a IN CASE OF ERROR -> DEACTIVATION SubSubsystem 2b A N D A N D [X] O R preventing latency (SML) We start with the maximum system In this case software is quite important We start finally with a 2* ASIL C/D System -> every one gets a half of the PFHM portion in the case of a fail safe goal / and each has to satisfy the SPFM/LFM For the fail operational goal we use complex voting but we have a system with a lot of components which can fail in a safe way, which is not good for our fail operational requirements. 17
INFLUENCE TO THE SYSTEM DESIGN We start with a maximum system configuration what would work Identify critical mechanism Remove, simplify or restructure the subsystems - Step by Step by analyzing the Fail Safe FTA vs. Fail Operational FTA (FTA shall contain the software mechanism) Recalculate the SPFM/LFM/PMHF Reliability / Safety optimization within each of the subsystems. Operation Research can be considered i.e. as Combinatorial Optimization Problem... 18
THE MODEL SOME ELEMENTS a binary system function in disjunctive normal form where and the components failure mode with Be A transformation function for failure modes which can be detected by a safety mechanism or are safe due to architectural constraints. Further it can be shown that probability/stochastic distribution can be directly apply in the system function in disjunctive normal form 19
THE MODEL A OPTIMIZATION PROBLEM Minimizing the costs Under the further conditions: Which represent the minimum requirements a for SPF and LF Metric according to the underlying ASIL Comply with the PMHF criteria according to the underlying ASIL and in case additional constraints 20
NOW WE TRY TO SIMPLIFY THE SYSTEM S1 S2 Subsystem 1 Subsystem 2 V O T E R Diagnostics (SMR) preventing latency (SML) S3 Subsystem 3 2 o o 3 DEPENDING ON THE APPLICATION, WE CAN DO FURTHER REDUCTION 21 S1 S2 a S2 b Main System Subsystem 2a Subsystem 2b (degraded fct.) V O T E R
THANK YOU 22