Tuesday, January 15, 2019

How Road Testing Self-Driving Cars Gets More Dangerous as the Technology Improves

Safe road testing of autonomous vehicle technology assumes that human "safety drivers" will be able to prevent mishaps. But humans are notoriously bad at supervising autonomy. Ensuring that road testing is safe requires designing the test platform to have high "supervisability." In other words, it must be easy for a human to stay in the loop and compensate for autonomy errors, even when the autonomy is gets pretty good and the supervisor job gets pretty boring. This excerpt from a draft paper explains the concept and why it matters.

Figure 1.

An essential observation regarding self-driving car road testing is that it relies upon imperfect human responses to provide safety. There is some non-zero probability that the supervisor (a "safety driver") will not react in a timely fashion, and some additional probability that the supervisor will react incorrectly. Either of these outcomes could be an incident or mishap. Such a non-zero probability of unsuccessful failure mitigation means it is necessarily the case that the frequency of autonomy failures will influence on-road safety outcomes.

However, lower autonomy failure rates are not necessarily better. The types and frequencies of autonomy failures will affect the supervisability of the system. Therefore, the field failure rate and types of failures must be compatible with the measures being taken to ensure supervisor engagement. Thus, the failure profile must be “appropriate” rather than low.

Non-Linear Autonomy/Human Interactions

A significant difficulty in reasoning about the effect of autonomy failure on safety is that there is a non-linear response of human attentiveness to autonomy failure. We propose that there are five different regions of supervisability of autonomy failures, with two different hypothetical scenarios based on comparatively lower and higher supervisability trends illustrated in the figures.

1. Autonomy fails frequently in a dangerous way. In essence this is autonomy which is not really working. A supervisor faced with an AV test platform that is trying to run off the road every few seconds should terminate the testing and demand more development. We assume that such a system would never be operated on public roads in the first place, making a public risk assessment unnecessary. (Debugging of highly immature autonomy on public roads seems like a bad idea, and presents a high risk of mishaps.)

2. Autonomy fails moderately frequently but works or is benign most of the time. In this case the supervisor is more likely to remain attentive since an autonomy failure in the next few seconds or minutes is likely. The risk in this scenario is probably dominated by the ability of the supervisor to plan and execute adequate fault responses, and eventual supervisor fatigue.

3. Autonomy fails infrequently. In this case there is a real risk that the supervisor will lose focus during testing, and fail to respond in time or respond incorrectly due to loss of situational awareness. This is perhaps the most difficult situation for on-road testing, because the autonomy could be failing frequently enough to present an unacceptably high risk, but so infrequently that the supervisor is relatively ineffective at mitigation. This dangerous situation corresponds to the “valley of degraded supervision” in Figure 1.

4. Autonomy fails very infrequently, with high diagnostic coverage. At a high level of maturity, the autonomy might fail so infrequently that it is almost safe enough, and even a relatively disengaged driver can deal with failures well enough to result in a system that is overall acceptably safe. High coverage failure detection that prompts the driver to take over in the event of a failure might help improve the effectiveness of such a system. The ultimate safety of such a system will likely depend upon its ability to detect a risky situation with sufficient advance warning for the supervisor to re-engage and take over safely. (This scenario is generally aligned with envisioned production deployment of SAE Level 3 autonomy.)

5. Autonomy essentially never fails. In this case the role of the supervisor is to be there in case the expectation of “never fails” turns out to be incorrect in testing. It is difficult to know how to evaluate the potential effectiveness of a supervisor, other than that the supervisor will have the same tasks as the “very infrequently” preceding case, but is expected not to have to perform them.
Perhaps counter-intuitively, the probability of a supervisor failure is likely to increase as the autonomy failure rate decreases from regions 1 to 5 above (from left to right along the horizontal axis of the figures). In other words, the less often autonomy fails, the less reliable supervisor intervention becomes. The most dangerous operational region will be #3, in which the autonomy is failing often enough to present a significantly elevated risk, but not often enough to keep the supervisor alert and engaged. This is a well understood risk that must be addressed in a road testing safety case.

Figure 2 illustrates this effect with hypothetical performance data that results in an overall test
platform safety value in accordance with [math in the full paper]. A hypothetical lower supervisability curve results in a region in which the vehicle is less safe than a conventional vehicle driven by a human driver.

Safe testing requires a comparatively higher supervisability curve to ensure that the overall test platform safety is sufficiently high, as shown by Figure 2.

Figure 2.


Because autonomy capabilities are generally expected to mature over time, the safety argument must
be revisited periodically during test and development campaigns as the autonomy failure rate decreases from region 2 to 3 above. An intuitive – but dangerously incorrect – approach would be to assume that the requirements for test supervision can be relaxed as autonomy becomes more mature. Rather, it seems likely that the rigor of ensuring supervisors are vigilant and continually trained to maintain their ability to react effectively needs to be increased as autonomy technology transitions from immature to moderately mature. This effect only diminishes when the AV technology starts approximating the road safety of a conventional human driver all on its own (regions 4 & 5).

If you are actively doing self-driving car testing on public roads, please contact me for a preprint of the full paper that includes a GSN safety argumentation structure for ensuring road testing safety. I plan to present the full paper at SAE WCX 2019 in April.

-- Phil Koopman, Edge Case Research & Carnegie Mellon University

1 comment:

  1. –First you are right. Drivers who experience more issues are more attentive to them. Being nervous or scared does that to humans. However, each of those issues can kill someone. And yes, the safer the system is the one requiring less attention. Which means you could die by falling asleep or so inattentive you cannot recover. Which is all nonsense if the time to recover, like most accidents are, is so short you cannot regain enough situational awareness to do the right thing.
    – How would anyone know which region it is at any time? Especially if you run new/progressively more complex scenarios???
    – Zero mention of simulation. Good or bad. Why?
    – Your solution appears to be – make sure you pay attention at all times. And don’t use public roads in region 1 but no mention of what to do instead.
    O “There is some non-zero probability that the supervisor (a "safety driver") will not react in a timely fashion, and some additional probability that the supervisor will react incorrectly. Either of these outcomes could be an incident or mishap. Such a non-zero probability of unsuccessful failure mitigation means it is necessarily the case that the frequency of autonomy failures will influence on-road safety outcomes.”
    O In English – NO MATTER WHAT YOU DO YOU MAY DIE because you cannot regain proper situational awareness. Which is what happens in most accident scenarios.
    O You say region 1 is too dangerous to allow public driving. How do you graduate to region 2? Wouldn’t region 1 always potentially exist for all new scenarios especially complex and dangerous ones? If so again how do you get passed region 1? Test tracks? For which you cannot create most scenarios let along duplicate them over and over to train the AI. Simulation? Nope never mentioned.

    Am I missing something?

    It is a myth that the use of public shadow driving is the best or only way to develop these systems. It is impossible to drive the one trillion miles or spend over $300B to stumble and restumble on all the scenarios necessary to complete the effort. Many of which are accident scenarios no one will want you to run once let alone thousands of times. You also cannot survive the thousands of needless casualties you will create trying. And finally, as the public, press and soon governments are figuring out, handover cannot be made safe for most complex scenarios, by any monitoring and notification system, because they cannot provide the time to regain proper situational awareness and do the right thing the right way.

    The solution is to use aerospace/DoD/FAA simulation technology, safety and engineering practices for 99.9% of this.

    It is not enough to tell people something is dangerous. You should tell them it is dangerous and untenable and then help them avoid those needless deaths and be successful.

    SAE Autonomous Vehicle Engineering Magazine
    End Public Shadow Driving
    https://www.nxtbook.com/nxtbooks/sae/ave_201901/index.php




    ReplyDelete

All comments are moderated by a human. While it is always nice to see "I like this" comments, only comments that contribute substantively to the discussion will be approved for posting.

Autonomous Vehicle Testing Safety Needs More Transparency

Last week there were two injuries involving human-supervised autonomous test shuttles on different continents, with no apparent connecti...