My 36 Years in System Safety: Looking Backward, Looking Forward

Similar documents
Week 2 Class Notes 1

Intro to Systems Theory and STAMP John Thomas and Nancy Leveson. All rights reserved.

Engineering a Safer and More Secure World

A New Approach to Safety in Software-Intensive Systems

Engineering a Safer and More Secure World

Engineering a Safer World. Prof. Nancy Leveson Massachusetts Institute of Technology

A New Systems-Theoretic Approach to Safety. Dr. John Thomas

Welcome to the STAMP/STPA Workshop

Engineering a Safer World

PSAS. Welcome!! And thanks to our sponsors: Akamai Technologies Liberty Mutual Insurance General Motors Corp.

System Safety Engineering

4 th European STAMP Workshop 2016

Applying systems thinking to safety assurance of Nuclear Power Plants

rones-vulnerable-to-terrorist-hijackingresearchers-say/

Modelling and Hazard Analysis for Contaminated Sediments Using STAMP Model

An Integrated Approach to Requirements Development and Hazard Analysis

STPA FOR LINAC4 AVAILABILITY REQUIREMENTS. A. Apollonio, R. Schmidt 4 th European STAMP Workshop, Zurich, 2016

Resilience Engineering: The history of safety

Architecture-Led Safety Process

UML and Patterns.book Page 52 Thursday, September 16, :48 PM

INTRODUCTION TO STAMP

The Preliminary Risk Analysis Approach: Merging Space and Aeronautics Methods

Safety in large technology systems. Technology Residential College October 13, 1999 Dan Little

PRIMATECH WHITE PAPER COMPARISON OF FIRST AND SECOND EDITIONS OF HAZOP APPLICATION GUIDE, IEC 61882: A PROCESS SAFETY PERSPECTIVE

Outline. Outline. Assurance Cases: The Safety Case. Things I Like Safety-Critical Systems. Assurance Case Has To Be Right

Application of STPA in Radiation Therapy: a Preliminary Study

Introduction. 25 th Annual INCOSE International Symposium (IS2015) Seattle, WA, July 13 July 16, 2015

The Need for New Paradigms in Safety Engineering

SAFETY CASES: ARGUING THE SAFETY OF AUTONOMOUS SYSTEMS SIMON BURTON DAGSTUHL,

INF3430 Clock and Synchronization

CIS 890: High-Assurance Systems

A system-theoretic, control-inspired view and approach to process safety

Engineering Spacecraft Mission Software using a Model-Based and Safety-Driven Design Methodology

Scientific Certification

Understanding STPA-Sec Through a Simple Roller Coaster Example

Addressing System Boundary Issues in Complex Socio-Technical Systems CSER 2007

Overview of EMESRT. Mike Thuesen (Anglo American) (On behalf of EMESRT)

Including Safety during Early Development Phases of Future ATM Concepts

A New Accident Model for Engineering Safer Systems

Focus on Mission Success: Process Safety for the Atychiphobist

Human Factors and Compliance Success

Safety-Driven Design for Software-Intensive Aerospace and Automotive Systems

Managing the risk of major accidents

A Taxonomy of Perturbations: Determining the Ways That Systems Lose Value

This is a preview - click here to buy the full publication

Focusing Software Education on Engineering

Putting the Systems in Security Engineering An Overview of NIST

Empirical Research on Systems Thinking and Practice in the Engineering Enterprise

Value Paper. Are you PAT and QbD Ready? Get up to speed

Executive Summary. Chapter 1. Overview of Control

This document is a preview generated by EVS

Design Principles for Survivable System Architecture

Software Challenges in Achieving Space Safety

Requirements and Safety Cases

Deviational analyses for validating regulations on real systems

Instrumentation and Control

PBS Basics. Contents. Purpose and overview UPDATED 11/27/2018

System of Systems Software Assurance

Human Factors Implications of Continuous Descent Approach Procedures for Noise Abatement in Air Traffic Control

Leveraging 21st Century SE Concepts, Principles, and Practices to Achieve User, Healthcare Services, and Medical Device Development Success

How Explainability is Driving the Future of Artificial Intelligence. A Kyndi White Paper

This document is a preview generated by EVS

ICH Q8, 9 & 10 and the Impact on the QP

Transferring knowledge from operations to the design and optimization of work systems: bridging the offshore/onshore gap

ABBREVIATIONS. jammer-to-signal ratio

New business through service innovation

A systems approach to risk analysis of maritime operations

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of

The Fear Eliminator. Special Report prepared by ThoughtElevators.com

37 Game Theory. Bebe b1 b2 b3. a Abe a a A Two-Person Zero-Sum Game

Designing for recovery New challenges for large-scale, complex IT systems

The GRAIL project: Galileo Localisation for the European Train Control System

A Risk-Based Decision Support Tool for Evaluating Aviation Technology Integration in the National Airspace System

DIGITAL TWINS: IDENTICAL, BUT DIFFERENT

Using Prevention through Design (PtD) to Help Reduce Risk in Construction

Multiple Antenna Techniques

Cockpit GPS Quick Start Guide

Small Airplane Approach for Enhancing Safety Through Technology. Federal Aviation Administration

Ethics. Paul Jackson. School of Informatics University of Edinburgh

DIGITAL INNOVATION MANUFACTURING EXECUTIVE. The Best Strategy for Reclaiming U.S. Manufacturing Jobs Is...

Academic Vocabulary Test 1:

in the New Zealand Curriculum

Lessons Learned from the US Chemical Safety and Hazard Investigations Board. presented at

Optimal Power Allocation over Fading Channels with Stringent Delay Constraints

Anadarko Basin Drilling Learning Curves Drivers. Pete Chacon

Variations on the Two Envelopes Problem

AIRWORTHINESS & SAFETY: ARE WE MISSING A LINK?

What s up with WAAS?

Headquarters U.S. Air Force

Domain Understanding and Requirements Elicitation

Address for Correspondence

FEE Comments on EFRAG Draft Comment Letter on ESMA Consultation Paper Considerations of materiality in financial reporting

Don t shoot until you see the whites of their eyes. Combat Policies for Unmanned Systems

8.F The Possibility of Mistakes: Trembling Hand Perfection

Human Factors Points to Consider for IDE Devices

The Challenge of Exploration: From Apollo to Pluto. Andrew Chaikin

Applying STPA-based Hazard Analysis to support HBSE for Systems built using MAPs

IS STANDARDIZATION FOR AUTONOMOUS CARS AROUND THE CORNER? By Shervin Pishevar

Engineered Resilient Systems DoD Science and Technology Priority

Pathways to Belonging and Influence:

Transcription:

My 36 Years in System : Looking Backward, Looking Forward Nancy Leveson System safety engineer (Gary Larsen, The Far Side)

How I Got Started Topics How I Got Started Looking Backward Looking Forward 2

How I Got Started 3

1940 1950 1960 1970 1980 1990 2000 2010 2020 FMEA FTA ETA HAZOP Bow Tie (CCA) FTA + ETA Introduction of computer control Exponential increases in complexity New technology Changes in human roles Assumes accidents caused by component failures 4

Changes in the Past 36 Years New causes of accidents created by use of software Role of humans in systems and in accidents has changed Increased recognition of importance of organizational and social factors in accidents Faster pace of technological change Learning from experience ( fly-fix-fly ) no longer as effective Introduces unknowns and new paths to accidents Less exhaustive testing is possible Increasing complexity Decreasing tolerance for single accidents 5

Reliability Engineering Approach to Examples: fail-safe, defense in depth Many accidents occur without any component failure Caused by equipment operation outside parameters and time limits upon which reliability analyses are based. Caused by interactions of components all operating according to specification. Highly reliable components are not necessarily safe Reliability is NOT equal to safety in complex systems 6

What Failed Navy aircraft were ferrying missiles from one location to another. One pilot executed a planned test by aiming at aircraft in front and firing a dummy missile. Nobody involved knew that the software was designed to substitute a different missile if the one that was commanded to be fired was not in a good position. In this case, there was an antenna between the dummy missile and the target so the software decided to fire a live missile located in a different (better) position instead. 7

Scenarios involving failures Unsafe scenarios A C B Unreliable but not unsafe (FMEA) Unreliable and unsafe (FTA, ETA, HAZOP, FMECA, ) Unsafe but not unreliable (???) Preventing Component or Functional Failures NOT Enough 8

9

Why Our Efforts are Often Not Cost-Effective (1) Efforts superficial, isolated, or misdirected Often isolated from engineering design Spend too much time and effort on assurance, not building safety in from the beginning Focusing on making arguments that systems are safe rather than making them safe /Assurance cases : Subject to confirmation bias Traditional system safety tries to prove the system is unsafe (looks for paths to hazards), not that it is safe must be built in, it cannot be assured in or argued in 10

Why Our Efforts are Often Not Cost-Effective (2) efforts start too late 80-90% of safety-critical decisions made in early system concept formation (C.O. Miller) Cannot add safety to an unsafe design Most of our techniques require a relatively complete design to work Focus efforts only on technical components of systems Ignore or only superficially handle Management decision making Operator error (and operations in general) culture Focus on development and often ignore operations 11

Why Our Efforts are Often Not Cost-Effective (3) Using inappropriate analysis techniques for systems built today Need new, more powerful safety engineering approaches to deal with complexity and new causes of accidents Inadequate risk assessment Applying probabilistic risk analysis for events that are not random Software errors are design errors, not random failures Human error is not random (slips vs. mistakes) Component interaction accidents (system design errors) are not random End up either leaving things out or making up numbers Need better ways to assess and communicate risk 12

Why Our Efforts are Often Not Cost-Effective (4) Limited learning from events Oversimplification of accident causation Blame is the enemy of safety Focus on who and not why Root cause seduction Believing in a root cause appeals to our desire for control Leads to a sophisticated whack a mole game Fix symptoms but not process that led to those symptoms In continual fire-fighting mode Having the same accident over and over 13

It s still hungry and I ve been stuffing worms into it all day.

Summary Doing things that require great effort and resources but demonstrably do not work Don t seem to notice Almost no evaluations of old techniques 15

The Problem is Complexity How do we traditionally deal with complexity? 1. Analytic Reduction 2. Statistics [Recommended reading: Peter Checkland, Systems Thinking, Systems Practice, John Wiley, 1981] 16

Analytic Reduction Divide system into distinct parts for analysis Physical aspects Separate physical or functional components Behavior Events over time Examine parts separately and later combine analysis results Assumes such separation does not distort phenomenon Each component or subsystem operates independently Components act the same when examined singly as when playing their part in the whole Events not subject to feedback loops and non-linear interactions 17

Statistics Treat system as a structureless mass with interchangeable parts Use Law of Large Numbers to describe behavior in terms of averages Assumes components are sufficiently regular and random in their behavior that they can be studied statistically 18

Traditional Approach to Uses Analytic Reduction and Statistics Divide system into components Assume accidents are caused by component failure Identify chains of directly related physical or logical (functional) component failures that can lead to a loss Evaluate reliability of components separately and later combine analysis results into a system reliability value Note: Assume randomness in the failure events so can derive probabilities for a loss 19

Accidents are Treated as Chains of Failure Events Forms the basis for most safety engineering and reliability engineering analysis: FTA, PRA, FMEA/FMECA, Event Trees, FHA, etc. and design (concentrate on dealing with component failure): Redundancy and barriers (to prevent failure propagation), High component integrity and overdesign, Fail-safe design, (humans) Operational procedures, checklists, training,. 20

(Gerald Weinberg, An Introduction to General Systems Thinking) 21

Applying Systems Thinking to (STAMP) Accidents involve a complex, dynamic process Not simply chains of failure events Arise in interactions among humans, machines and the environment Treat safety as a dynamic control problem requires enforcing a set of constraints on system behavior Accidents occur when interactions among system components violate those constraints becomes a control problem rather than just a reliability problem 22

Examples of Constraints Power must never be on when access door open Two aircraft must not violate minimum separation Aircraft must maintain sufficient lift to remain airborne Public health system must prevent exposure of public to contaminated water and food products Pressure in a deep water well must be controlled Runway incursions and operations on wrong runways or taxiways must be prevented 23

Emergent properties (arise from complex interactions) Process Process components interact in direct and indirect ways The whole is greater than the sum of its parts and security are examples of emergent properties 24

Controller Controlling emergent properties (e.g., enforcing safety constraints) Individual component behavior Component interactions Control Actions Feedback Process Process components interact in direct and indirect ways 25

Controller Controlling emergent properties (e.g., enforcing safety constraints) Individual component behavior Component interactions Air Traffic Control: Throughput Control Actions Feedback Process Process components interact in direct and indirect ways 26

Example Control Structure

Treated as a Dynamic Control Problem Goal: Design an effective control structure that eliminates or reduces adverse events. Need clear definition of expectations, responsibilities, authority, and accountability at all levels of safety control structure Need appropriate feedback Entire control structure must together enforce the system safety property (constraints) Physical design (inherent safety) Operations Management Social interactions and culture 28

A Broad View of Control Component failures and unsafe interactions may be controlled through design (e.g., redundancy, interlocks, fail-safe design) or through process Manufacturing processes and procedures For humans, change the context in which they are operating Maintenance processes Operations or through social controls (e.g., regulatory, insurance, legal, culture, or individual self-interest) For humans, change the context in which they are operating Human error is a symptom of a system that needs to be redesigned. 29

STAMP (System Theoretic Accident Model and Processes) A new, more powerful accident causality model Based on systems theory, not reliability theory Treats accidents as a dynamic control problem (vs. a failure problem) Includes Entire socio-technical system (not just technical part) Component interaction accidents Software and system design errors Human errors 30

Paradigm Change Does not imply what previously done is wrong and new approach correct Einstein: Progress in science (moving from one paradigm to another) is like climbing a mountain As move further up, can see farther than on lower points

Paradigm Change (2) New perspective does not invalidate the old one, but extends and enriches our appreciation of the valleys below Value of new paradigm often depends on ability to accommodate successes and empirical observations made in old paradigm. New paradigms offer a broader, rich perspective for interpreting previous answers.

Resist trying to integrate systems thinking with analytic reduction Trying to shoehorn new technology and new levels of complexity into old methods does not work Trying to merge systems thinking into the old models and techniques will not work 33

How I Got Started Recent Progress Large companies are starting training programs in STPA for their employees DoD training program in using STPA for security Cited as an example in ISO 26262 draft (out in 2018) Recent successes in applying to workplace safety and engineering management Lots of evaluations and comparisons with traditional techniques all with STPA finding things that the traditional techniques do not Lots of new applications 34

Adding Coordination to STPA: Col. Kip Johnson (9/2016) Leveson 2012 36

How I Got Started Some Important Research Problems Applying STAMP to other properties besides safety and security How to integrate into a large company (training, facilitators, how to implement a paradigm change into industries?) More help with generating causal scenarios from UCAs Generating UCAs (Thomas method complete but harder to teach and maybe do, use to check completeness?) Risk assessment without resorting to unknown and unknowable probabilities Use in operations Controlling unplanned and unsafe changes Human factors in STPA and CAST 37

How I Got Started Lessons I ve Learned over 36 Years It is important that students bring a certain ragamuffin barefoot irreverence to their studies. They are here not to worship what is known, but to question it. Jacob Bronowski, The Ascent of Man The starting point is to question our assumptions. It s never what we don t know that stops us. It s what we do know that just ain t so If you want to make important contributions, work on important problems Pick the problem first, not the solution Understanding a problem is the first step to solving it Don t play follow the leader or jump on bandwagons Work on problems you care about From an anonymous proposal review: Nancy is passionate about safety, which is her greatest strength and her greatest weakness 38

How I Got Started Summary of Where We Need to Go in System Expand our accident causation models Create new, more powerful and inclusive hazard analysis techniques Use new system design techniques -guided design Integrate System more into system engineering Improve accident analysis and learning from events Improve control of safety during operations Improve management decision-making and safety culture 39

40