Wednesday, April 3, 2019

Nondeterministic Behavior and Legibility in Autonomous Vehicle Validation

Nondeterministic Behavior and Legibility:
How do you know your autonomous vehicle passed the test for the right reason? What if it just got lucky, or is gaming the test?

The nature of the algorithms used by autonomy systems creates problems for modelling and testing that go beyond typical safety critical software. Some autonomy algorithms, such as randomized path planning, are inherently non-deterministic. Others can be brittle, failing dramatically with subtle variations in data, such as perception false negatives induced by adversarial attacks (Szegedy at al. 2013) or false negatives induced by slight image degradation due to haze or defocus (Pezzementi et al. 2018).

A related issue is over-fitting to the test, in which an autonomy system over-fits and learns how to beat a fixed test. By analogy, this is the pitfall of the system cheating by having memorized the correct answers. A proposed way to deal with this risk is by randomly varying aspects of test cases. 

In such a fuzzing or variable testing approach it is important to randomly vary all relevant aspects of a problem. For example, varying geometries for traffic situations can be helpful, but probably does not address potential over-fitting for perception algorithms that perform object classification.

The use of potentially non-deterministic test scenarios combined with non-deterministic system behaviors and opaque system designs means it is difficult to know whether a system has passed a test, because there is no single correct answer. Rather, there must be some algorithmic way to determine whether a particular system response is acceptable or not, making that test oracle algorithm safety critical.

Moreover, it is possible that a system has passed a particular test by chance. For example, a pedestrian might be avoided due to a properly functioning detection and avoidance algorithm. But a pedestrian might also be avoided merely because a random path planner by chance picked a path that did not intersect the pedestrian, or responded to a completely unrelated aspect of the environment that caused it to pick a fortuitously safe path. Similarly, a pedestrian might be detected in one image, but undetected in another that differs in ways that are essentially imperceptible to a human.

It is unclear if resolving this issue requires solving the difficult problem of explainable AI (Gunning 2018). As a minimum, a credible safety argument will need to address the problem of how plans to test vehicles with less than a statistically valid amount of real-world exposure data can avoid these pitfalls. It seems likely that a credible argument will also have to establish that each type of test has been passed due to safe operation of the system rather than simply by chance (Koopman & Wagner 2018).

(This is an excerpt of our SSS 2019 paper:  Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019.  Read the full text here)
  • Gunning, D. (2018), Explainable Artificial Intelligence (XAI), Defense Advanced Research Projects Agency, (accessed October 27, 2018).
  • Koopman, P. & Wagner, M., (2018) "Toward a Framework for Highly Automated Vehicle Safety Validation," SAE World Congress, 2018. SAE-2018-01-1071.
  • Pezzementi, Z., Tabor, T., Yim, S., Chang, J., Drozd, B., Guttendorf, D., Wagner, M., Koopman, P., "Putting image manipulations in context: robustness testing for safe perception," IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Aug. 2018.
  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fer-gus, R. (2013) "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199 (2013).


  1. Call me old-fashioned, but when non-deterministic behavior is mentioned in the context of safety-critical systems, it is like asking how Russian Roulette might be made safe, short of removing the bullet. If there is one principle ingrained in me through many years of designing, evaluating, and validating safety-critical railroad signal components and systems, it is that complexity is the enemy of safety.

    Relying on non-deterministic algorithms to enhance safety is oxymoronic, precisely because it makes analysis of system behavior incredibly more difficult. It also makes testing far more difficult as well.

    It is one thing to automate a railroad or people mover system on a dedicated right-of-way where environmental factors can be controlled to some extent, and the system has a known safe state. It is quite another to fully automate automobiles for which the safe state may change from time to time while in use, and the environment is essentially uncontrolled. (Does the vehicle try to continue moving in a snowstorm when the road is icy and snow-covered and visibility is limited, risking an accident, or does it come to a stop and allow the passengers to freeze to death?)

    Although there seems to be a great deal of optimism that HAVs will improve safety, and expectations are high that they will be available in just a few more years, I believe it will be many years yet before that optimism will be justified. Baby steps... baby steps. The industry has already had a few high-visibility accidents with limited automation, and will have more. Perhaps as reality sets in, expectations will dampen a bit and caution will be taken a little more seriously.

    The recent crashes of the 737 MAX airliners has awoken many observers to the fact that even very experienced and safety-conscious organizations can blow it under the pressure to put products out too fast and too cheap. These accidents are going to cost Boeing billions of dollars in liability and lost business, not to mention the damage to its reputation. Companies like Uber and Waymo are engaging in a very dangerous business and I truly hope they are working to establish rigorous safety cultures that will prevent them from going through the kind of experience that Boeing is.

  2. There is no doubt that making non-deterministic systems safe is a challenge. That's why I'm devoting a significant part of my time to working on UL 4600. Setting up a strong safety culture is a required part of succeeding at building these types of systems.


All comments are moderated by a human. While it is always nice to see "I like this" comments, only comments that contribute substantively to the discussion will be approved for posting.

Autonomous Vehicle Testing Safety Needs More Transparency

Last week there were two injuries involving human-supervised autonomous test shuttles on different continents, with no apparent connecti...