Safe Autonomy: January 2023

Monday, January 23, 2023

Correct use of terms: regression, bug, glitch, testing, and beta

I just saw another misuse of critical system terminology today by someone who is working on scholarly publications. Because they picked up that habit by working with an autonomous vehicle company. The misuse and abuse of terminology to desensitize people to life critical system defects gets worse all the time. Perhaps it is time to refresh vocabulary for those who have only heard the misuses and don't realize they sound dumb as a box of rocks when they simply repeat what they hear others saying at work. (Caution -- pet peeve meets yelling at clouds here, because the tech industry has invested a decade degrading the meaning of these terms for PR value. If that bothers you just move to the next posting...).

Here are some key terms in play. (And it's not just me -- I point to wikipedia entries for each.)

Regression:

This is not the general term for "bug". It is a very specific defect in which a previously operational feature stops working now. Even further back, it was specifically a previous bug fix that stopped working in a new version. If it's a "defect" call it a "defect." Or maybe a "requirements defect" if it is a discovery of behavior you did not previously realize needed to be in the requirements. (https://en.wikipedia.org/wiki/Software_regression)

Bug:

While originally humorous slang, it now does more harm than good to use this term. Use the word "defect" instead. (https://en.wikipedia.org/wiki/Bug_(engineering))

Glitch:

This is a defect too, and even more dangerous to use because it minimizes the issue in a safety critical system. ("It was just a glitch -- let's see if it kills anyone else before we fix it.") The correct use is a transient defect, typically one that is difficult to reproduce. (https://en.wikipedia.org/wiki/Glitch) Whether it is less serious than a permanent defect depends on the body count.

Testing:

Executing a system according to a plan intended to validate engineered behavior by comparing it against expected behavior. Driving a car around to see what happens is not really testing -- it is just messing around. (Messing around can have value to discover requirements, but it is not properly called "testing"). (https://en.wikipedia.org/wiki/System_testing)

Beta:

A specific type of testing carried out by sophisticated early adopters for a product that is believed to be fully functional (if any significant defects, the Beta tester is explicitly warned about each and every one in detail). It is not a legal CYA word for "it doesn't really work, but go ahead and have a go anyway" which is more properly called an engineering prototype, and which should not be sold to the general public via retail channels as if it were a real product. (https://en.wikipedia.org/wiki/Software_testing#Beta_testing)

Sunday, January 8, 2023

The case for AVs being 10 to 100 times safer than human drivers

There is a case to be made that at-scale AV deployments should be at least ten times safer than human drivers, and perhaps even safer than that. The rationale for this large margin is leaving room for the effects of uncertainty via incorporating a safety factor of some sort.[1]

Consider all the variables and uncertainty discussed in this chapter. We have seen significant variability in fatality and injury rates for baseline human drivers depending on geographic area, road type, vehicle type, road user types, driver experience, and even passenger age. All those statistics can change year by year as well.

Additionally, even if one were to create a precise model for acceptable risk for a particular AV’s operational profile within its ODD, there are additional factors that might require an increase:

· Human biases to both want an AV safer than their own driving and to over-estimate their own driving ability as discussed in a previous section. In short, drivers want an AV driving their vehicle to better than they think they are rather than better than they actually are.

· Risk of brand tarnish from AV crashes which are treated as more newsworthy than human-driven vehicle crashes of comparable severity. Like it or not, AV crashes are going to be covered by news outlets as a consequence of the same media exposure that created interest in and funding for AV developers. Even if AVs are exactly as safe as human drivers in every respect, each highly publicized crash will call AV safety into question and degrade public trust in the technology.

· Risk of liability exposure to the degree that AV crashes are treated as being caused by product defects rather than human driver error. For better or worse (mostly for worse), “driver error” is attributed to a great many traffic fatalities rather than equipment failure or unsafe infrastructure design. Insurance tends to cover the costs. Even if a judicial system is invoked for drunk driving or the like, the consequences tend to be limited to the participants of a single mishap, and the limits of personal insurance coverage limit the practical size of monetary awards in many cases. However, the stakes might be much higher for an AV if it is determined that the AV is systematically prone to crashes in certain conditions or is overall less safe than a human driver. A product defect legal action could affect an entire fleet of AVs and expose a deep-pockets operator or manufacturer to having to pay a large sum. Being seen to be dramatically safer than human drivers could help both mitigate this risk and provide a better argument for responsible AV developer behavior.

· The risk of not knowing how safe the vehicle is. The reality is that it will be challenging to predict how safe an AV is when it is deployed. What if the safety expectation is too optimistic? Human-driven vehicle fatalities in particular are so rare that it is not practicable to get enough road experience to validate fatality rates before deployment. Simulation and other measures can be used to estimate safety but will not provide certainty. The next chapter talks about this in more detail.

Taken together, there is an argument to be made that AVs should be safer than human drivers by about a factor of 10 (being a nice round order of magnitude number) to leave engineering margin for the above considerations. A similar argument could be made for this margin to be an even higher factor of 100, especially due to the likelihood of a high degree of uncertainty regarding safety prediction accuracy while the technology is still maturing.

The factor of 100 is not to say that the AV must be guaranteed to be 100 times safer. Rather, it means that the AV design team should do their best to build an AV that is expected to be 100 times safer plus or minus some significant uncertainty. The cumulative effect of uncertainties in safety prediction, inevitable fluctuations in operational exposure to risky driving conditions, and so on might easily cost a factor of 10 in safety.[2] That will in turn reduce achieved safety to “only” a factor of 10 better than a baseline human driver. That second factor of 10[3] is intended to help deal with the human aspect of expectations being not just a little better than the safety of human drivers, but a lot better, the risk of getting unlucky with an early first crash, and so on.

Waiting to deploy until vehicles are thought to be 100 times safer than humans is not a message investors and design teams are likely to want to hear. But it is, however, a conservative way to think about safety that leaves room for the messiness of real-world engineering to deploy AVs. Any AV deployed will have a safety factor over (or under) Positive Risk Balance (PRB).

The question is whether the design team will manage their PRB safety factor proactively. Or not.

[1] Safety factors and derating are ubiquitous in non-software engineering. It is common to see safety factors of 2 for well understood areas of engineering, but values can vary. A long-term challenge for software safety is understanding how to make software twice as “strong” for some useful meaning of the word “strong.” Over-simplifying, with mechanical structures, doubling the amount of steel should make it support twice the load. But with software, adding twice the number of lines of code just doubles the number of defects, potentially making the system less reliable instead of more reliable unless special techniques are applied very carefully. And even then, predicted improvement can be controversial.
See: https://en.wikipedia.org/wiki/Factor_of_safety
https://en.wikipedia.org/wiki/Derating
and https://en.wikipedia.org/wiki/N-version_programming

[2] For better or worse – but given the optimism ingrained in most engineers, probably not for better.

[3] Some good news here – by the time you have a safety factor of 10 or more, nuances such as driver age and geofence zip codes start being small compared to the safety factor. If someone says they have a safety factor of 10, it is OK not to sweat the small stuff.

The Tesla Autopilot Crashes Just Keep Coming

Picture from a Tesla AP-related crash just before impact into a disabled vehicle:

(Video here on twitter) Tesla autopilot crashes are still happening when drivers (apparently) succumb to automation complacency. It seems they've just stopped being news.

The above picture is from a Tesla camera a fraction of a second before impact. (Somehow it seems there was no injury.) The Tesla was said to have initiated AEB (and disabled AP) about two seconds before impact. The video shows clear sightline to the disabled vehicle for at least 5 seconds, but the driver apparently did not react.

Tesla fans can blame the driver all they want -- but that won't stop the next similar crash from happening. Pontificating about personal responsibility and that the driver should have known better won't change things either. And we're far, far past the point where "education" is going to move the needle on this issue.

It's time to get serious about:
- Requiring effective driver monitoring
- Addressing the very real #autonowashing problem that has so many users of these features thinking their cars really drive themselves.
- Requiring vehicle automation features to account for reasonably foreseeable misuse (you might fix the vehicle, or you might fix the driver, or more likely fix both, but casting blame accomplishes nothing)

The deeper issue here is pretending that autopilot-type systems involve humans who think they are driving. The car is driving and the humans are along for the ride, no matter what disclaimers are in the driver manual and/or warnings -- unless the vehicle designers can show they have a driver monitoring system and engagement model that provide real-world results.

The reality is that these are not "driver assistance" systems. They are automated vehicles with a highly problematic approach to safety. This goes for all companies. Tesla is simply the most egregious due to poor driver monitoring quality and scale of deployed fleet. As human-supervised automated driving gets more functionality the safety problem will just keep getting worse.

Source on twitter: https://twitter.com/greentheonly/status/1607475055713214464?ref_src=twsrc%5Etfw from Dec 26th: contains video of impact. No injury apparent to the person in the video, but it was a very close thing. Also a screenshot of the vehicle log showing AEB engaged 2 seconds before impact. https://twitter.com/greentheonly/status/1609271955383029763?ref_src=twsrc%5Etfw
For those saying "but Teslas are safer overall" that statement does not stem from any credible data I've ever seen: https://safeautonomy.blogspot.com/2022/12/take-tesla-safety-claims-with-about.html