Safe Autonomy: safety monitor

Sunday, June 20, 2021

A More Precise Definition for ANSI/UL 4600 Safety Performance Indicators (SPIs)

Safety Performance Indicators (SPIs) are defined by chapter 16 of ANSI/UL 4600 in the context of autonomous vehicles as performance metrics that are specifically related to safety (4600 at 16.1.1.6.1).

Woman in Taxi looking at phone. Photo by Uriel Mont from Pexels

This is a fairly general definition that is intended to encompass both leading metrics (e.g., number of failed detections of pedestrians for a single sensor channel) and lagging metrics (e.g., number of collisions in real world operation).

However, it is so general that there can be a tendency to try to call metrics that are not related to safety SPIs when, more properly, they are really KPIs. As an example, ride quality smoothness when cornering is a Key Performance Indicator (KPI) that is highly desirable for passenger comfort. But it might have little or nothing to do with the crash rate for a particular vehicle. (It might be correlated -- sloppy control might be associated with crashes, but it might not be.)

So we've come up with a more precise definition of SPI (with special thanks to Dr. Aaron Kane for long discussions and crystalizing the concept).

An SPI is a metric supported by evidence that uses a
threshold comparison to condition a claim in a safety case.

Let's break that down:

SPI - Safety Performance Indicator - a {metric, threshold} pair that measures some aspect of safety in an autonomous vehicle.
Metric - a value, typically related to one or more of product performance, design quality, process quality, or adherence to operational procedures. Often metrics are related to time (e.g., incidents per million km, maintenance mistakes per thousand repairs) but can also be related to particular versions (e.g., significant defects per thousand lines of code; unit test coverage; peer review effectiveness)
Evidence - the metric values are derived from measurement rather than theoretical calculations or other non-measurement sources
Threshold - a metric on its own is not an SPI because context within the safety case matters. For example, false negative detections on a sensor as a number is not a SPI because it misses the part about how good it has to be to provide acceptable safety when fused with other sensor data in a particular vehicle's operational context. ("We have 1% false negatives on camera #1. Is that good enough? Well, it depends...") There is no limit to the complexity of the threshold which might be, for example, whether a very complicated state space is either inside or outside a safety envelope. But in the end the answer is some sort of comparison between the metric and the threshold that results in "true" or "false." (Analogous multi-valued operations and outputs are OK if you are using multi-valued logic in your safety case.) We call the state of an SPI output being "false" an SPI Violation.
Condition a claim - each SPI is associated with a claim in a safety case. If the SPI is true the claim is supported by the SPI. If the SPI is false then the associated claim has been falsified. (SPIs based on time series data could be true for a long time before going false, so this is a time and state dependent outcome in many cases.)
Safety case - Per ANSI/UL 4600 a safety case is "a structured argument, supported by a body of evidence, that provides a compelling, comprehensible and valid case that a system is safe for a given application in a given environment." In the context of that standard, anything that is related to safety is in the safety case. If it's not in the safety case, it is by definition not related to safety.

A direct conclusion of the above is that if a metric does not have a threshold, or does not condition a claim in a safety case, then it can't be an SPI.

Less formally, the point of an SPI is that you've built up a safety case, but there is always the chance you missed something in the safety case argument (forgot a relevant reason why a claim might not be true), or made an assumption that isn't as true as you thought it was in the real world, or otherwise have some sort of a problem with your safety case. An SPI violation amounts to: "Well, you thought you had everything covered and this thing (claim) was always true. And yet, here we are with the claim being false when we encountered a particular unforeseen situation in validation or real world operation. Better update your safety argument!"

In other words, a SPI is a measurement you take to make sure that if your safety case is invalidated you'll detect it and notice that your safety case has a problem so that you can fix it.

An important point of all this is that not every metric is an SPI. SPIs are a very specific term. The rest are all KPIs.

KPIs can be very useful, for example in measuring progress toward a functional system. But they are not SPIs unless they meet the definition given above.

NOTES:

The ideas in this posting are due in large part to efforts of Dr. Aaron Kane. He should be cited as a co-author of this work.

(1) Aviation uses SPI for metrics related to the operational phase and SMS activities. The definition given here is rooted in ANSI/UL 4600 and is a superset of the aviation use, including technical metrics and design cycle metrics as well as operational metrics.

(2) In this formulation an SPI is not quite the same as a safety monitor. It might well be that some SPI violations also happen to trigger a vehicle system shutdown. But for many SPI violations there might not be anything actionable at the individual vehicle level. Indeed, some SPI violations might only be detectable at the fleet level in retrospect. For example, if you have a budget of 1 incident per 100 million km of a particular type, an individual vehicle having such an incident does not necessarily mean the safety case has been invalidated. Rather, you need to look across the fleet data history to see if such an incident just happens to be that budgeted one in 100 million based on operational exposure, or is part of a trend of too many such incidents.

(3) We pronounce "SPI" as "S-P-I" rather than "spy" after a very confusing conversation in which we realized we needed to explain to a government official that we were not actually proposing that the CIA become involved with validating autonomous vehicle safety.

Saturday, February 24, 2018

SCAV 2017 Keynote: Challenges in Autonomous Vehicle Validation

Challenges in Autonomous Vehicle Testing and Validation from Philip Koopman

Challenges in Autonomous Vehicle Validation

Keynote Presentation Abstract

Philip Koopman

Carnegie Mellon University; Edge Case Research LLC

ECE Dept. HH A-308, 5000 Forbes Ave., Pittsburgh, PA, USA

koopman@cmu.edu

Developers of autonomous systems face distinct challenges in conforming to established methods of validating safety. It is well known that testing alone is insufficient to assure safety, because testing long enough to establish ultra-dependability is generally impractical. That’s why software safety standards emphasize high quality development processes. Testing then validates process execution rather than directly validating dependability.

Two significant challenges arise in applying traditional safety processes to autonomous vehicles. First, simply gathering a complete set of system requirements is difficult because of the sheer number of combinations of possible scenarios and faults. Second, autonomy systems commonly use machine learning (ML) in a way that makes the requirements and design of the system opaque. After training, usually we know what an ML component will do for an input it has seen, but generally not what it will do for at least some other inputs until we try them. Both of these issues make it difficult to trace requirements and designs to testing as is required for executing a safety validation process. In other words, we’re building systems that can’t be validated due to incomplete or even unknown requirements and designs.

Adaptation makes the problem even worse by making the system that must be validated a moving target. In the general case, it is impractical to validate all the possible adaptation states of an autonomy system using traditional safety design processes.

An approach that can help with the requirements, design, and adaptation problems is basing a safety argument not on correctness of the autonomy functionality itself, but rather on conformance to a set of safety envelopes. Each safety envelope describes a boundary within the operational state space of the autonomy system.

A system operating within a “safe” envelope knows that it’s safe and can operate with full autonomy. A system operating within an “unsafe” envelope knows that it’s unsafe, and must invoke a failsafe action. Multiple partial specifications can be used as an envelope set, with the intersection of safe envelopes permitting full autonomy, and the union of unsafe envelopes provoking validated, and potentially complex, failsafe responses.

Envelope mechanisms can be implemented using traditional software engineering techniques, reducing the problems with requirements, design, and adaptation that would otherwise impede safety validation. Rather than attempting to prove that autonomy will always work correctly (which is still a valuable goal to improve availability), the envelope approach measures the behavior of one or more autonomous components to determine if the result is safe. While this is not necessarily an easy thing to do, there is reason to believe that checking autonomy behaviors for safety is easier than implementing perfect, optimized autonomy actions. This envelope approach might be used to detect faults during development and to trigger failsafes in fleet vehicles.

Inevitably there will be tension between simplicity of the envelope definitions and permissiveness, with more permissive envelope definitions likely being more complex. Operating in the gap areas between “safe” and “unsafe” requires human supervision, because the autonomy system can’t be sure it is safe.

One way to look at the progression from partial to full autonomy is that, over time, systems can increase permissiveness by defining and growing “safe” envelopes, shrinking “unsafe” envelopes, and eliminating any gap areas.

ACM Reference format:
P. Koopman, 2017. Challenges in Autonomous Vehicle Validation. In
Proceedings of 1st International Workshop on Safe Control of Connected
and Autonomous Vehicles, Pittsburgh, Pennsylvania, USA, April 2017
(SCAV 2017), 1 page.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Copyright is held by the owner/author(s).
SCAV'17, April 21-21 2017, Pittsburgh, PA, USA
ACM 978-1-4503-4976-5/17/04.
http://dx.doi.org/10.1145/3055378.3055379

(Originally posted 4/23/2017)