Safe Autonomy: November 2018

Zachary Pezzementi and Trenton Tabor have done some great work on perception systems in general, and how image degradation affects things. I'd previously posted information about their paper, but now there is a webinar available here:
Webinar home page with details & links: http://ieeeagra.com/events/webinar-november-4-2018/

This includes pointers to slides, a recorded webinar, the paper, and papers.

My robustness testing team at NREC worked with them on the perception stress testing parts, so here are quick links to the parts covering that part:

Paper: robustness testing for safe perception
Webinar slides 40-50: http://ieeeagra.com/wp-content/uploads/2018/11/Webinar46.pdf
Video time 44:43 - 54:43: http://ieeeagra.com/wp-content/uploads/2018/11/Webinar46_300k.mp4

Summary: Uber's reports indicate that they are taking improving their safety culture seriously. Their new approach to public road testing seems reasonable in light of current practices. Whether they can achieve an appropriate level of system safety and software quality for the final production vehicles remains an open question -- just as it does for the self-driving car industry in general.

Uber ATG has released a set of materials regarding their in-development self-driving car technology and testing, including a NHTSA-style safety report, as well as reports of a safety review in light of the tragic death in Tempe AZ earlier this year. (See: https://www.uber.com/info/atg/safety/)

Generally I have refrained from critical analysis of other company safety reports because this is still something everyone is sorting out. Anyone putting out a safety report is automatically in the top 10% for transparency (because about 90% of the companies haven't even released one yet). So by this metric Uber looks good. In fact, their report has a lot more detail than we've been seeing in general, so kudos to them for improving transparency. The other companies who haven't even published a report at all should get with the program.

But, Uber has also had a fatality as a result of their on-road test program. If any company should be put under increased scrutiny for safety it should be them. I fully acknowledge that many of the critique points apply to other companies as well, so this is not about whether they are ahead or behind, but rather how they stand on their own merits. (And, if anyone at Uber thinks I got something wrong, please let me know.)

Overall

It seems that Uber's development and deployment plan is generally what we're seeing from other companies. They plan to operate on public roads to build a library of surprises and teach their system how to handle each one they encounter. They plan to have safety drivers (Mission Specialists) intervene when the vehicle encounters something it can't handle. As a result of the fatal mishap they plan to improve safety culture, improve safety drivers, and do more pre-testing simulation. There is every reason to believe that at least some other companies were already doing those things, so this generally puts Uber on a par with where all companies doing road testing should be.

Clearly the tragic death in Tempe got Uber's attention, as it should have. Let's hope that other companies pay attention to the lessons learned before there is another fatality.

Doing the math, there should be no fatalities in any reasonable pre-deployment road test program. That's because there simply won't be enough miles accumulated in road testing with a small fleet to reach a level at which an average human driver would be likely to have experienced a fatal accident. (It is not zero risk, just as everyday driving is not risk free. But a fatality should be unlikely.)

The Good

This is perhaps the most thorough set of safety reports yet. We've been seeing a trend that more recent reports often include areas not touched on by earlier reports. I hope this results in a competitive dynamic in which each company wants to raise the bar for safety transparency. We'll see how this turns out. Uber is certainly doing their part.
The materials place significant emphasis on improving safety culture, including excellent recommendations of good practices from the external report. Safety culture is essential. I'm glad to see this.
There are detailed discussions about Mission Specialist roles, responsibilities, and training. This is important. Supervising autonomy is a difficult, demanding role, and gets more difficult as the autonomy gets better. Again, I'm glad to see this.
There is quite a bit about hardware quality for both computer hardware and vehicle hardware. It is hard to tell how far down the ISO 26262 hardware safety path they are for life critical functions such as disengaging autonomy for Mission Specialist takeover. They mention some external technical safety reviews, but none recently. This is a good start, but more work required here. They say they plan more external reviews, which is good.
They state concrete goals for each of their five safety principles. This is also good.

Jury Still Out on Fully Autonomous Operation System Safety:

The section on system safety is for the most part aspirational. Even the section that is not forward looking is mostly about plans, not current capabilities. This is consistent with currently using Mission Specialists to ensure testing safety. In other words, assuming the Mission Specialist can avoid mishaps, and the vehicle always responds to human driver takeover, this isn't a problem yet. So we'll have to wait to see how this turns out.
The external review document concentrated on safety culture and road testing supervision. That would be consistent with a conclusion that the fatality root causes were poor safety culture and ineffective road testing supervision. (Certainly it would be no surprise if this hypothetical analysis were true, but we'll see what the final NTSB report says to know for sure.)
In general, we have no idea how they plan to prove that their vehicles are safe to deploy other than by driving around until they feel they have sufficiently infrequent road failures. Simulation will improve quality before they do road testing, but they are counting on road testing to find the problems. To be clear, road testing and that type of simulation can help, but I don't believe they are enough. (This is the same as for many other developers, so this is not a particular criticism of Uber.)
Uber says they are working on a safety case, perhaps using a GSN-based approach. This is an excellent idea. But I don't see anything that looks like a formal safety case in these documents. Hopefully we'll get to see something like that down the road.

Software Quality and Software Safety:

The software development process shown on page 47 of the report emphasizes fixing bugs found in testing. I don't know of any application domain in which that alone will actually get you acceptably safe life-critical software. Again, for now their primary safety strategy for testing is Mission Specialists, so this is an issue for the future. Maybe we'll find out if they are doing more in a later edition of this report.
The information on software quality and software development process description is a bit skimpy in general. It is difficult to tell if that is a reflection of their process or they just didn't want to talk about it. For example, there is a box in their process diagrams on page 47 that says "peer review" with no description as to which of the many well-known review techniques they are using, whether they review everything, etc. There are no boxes in their software process for requirements, architecture, and design. There isn't an SQA function described (Software Quality Assurance, which deals with development process quality monitoring). For Agile fans, there are plenty of boxes missing from whichever methodology you like. The point is that this is an incomplete software process model compared to what I'd expect to see for life-critical software. The question is whether the pieces are there and not drawn. Again, there is no other industry in which the approach shown would be sufficient or acceptable for creating life-critical software. It is possible there is more to their process than they are revealing, or they have some plan to address this before they remove their Mission Specialists from the vehicles.
Perception (having the system recognize objects, obstacles, and so on) is notoriously difficult to get right, and probably the hardest problem out of many difficult problems to make self driving cars safe. They talk about how they use perception, but not how they plan to validate it, other than via road testing and possibly some aspects of simulating scenarios observed during road testing. But then again, other developers don't say much about this either.
It's easy to believe that at least some other organizations are following similar software approaches and will face the same challenges. Again, because they currently have safety drivers these are forward-looking issues that are not the primary concerns for Uber road testing safety in the near term.
It is worth noting that they plan to have a secondary fail-over computer that they say will be developed at least taking into account software quality and safety standards such as ISO 26262 and MISRA C. (Safety Report Page 35.) But they don't seem to say if this is what they are doing for their basic control software that controls normal operation. Again, perhaps there is more to this they haven't revealed.

Is It Enough?

Overall the reports seem to put them on a par with other developers in terms of road testing safety. Whether they operate safely on public roads will largely depend upon maintaining their safety culture and Mission Specialist proficiency. I'd suggest an independent monitor within the organization to make sure that happens.

What I'd Like to See

There are a number of things I'd like to see from Uber to help in regaining public trust. (These same recommendations go for all the other companies doing public road testing.)

Uber should issue periodic report cards in which they tell us about adopting the recommendations in their various reports and their safety plans in general. Are they staying on track? Did the safety culture initiative really work? Are they following their Mission Specialist procedures?
I'd like to see metrics that track the effectiveness of Mission Specialists. Nobody is perfect, but I'd be happier having data about how often they get distracted to see whether the break schedule and monitoring are working as intended. This should be something all companies do, since in the end they are putting the public at risk. The effectiveness of Mission Specialists who have been assigned a difficult job is their stated way to mitigate that risk -- but we have no insight as to whether that approach is really working until a crash is in the news.
They have promised safety metrics that are better than disengagements and miles driven. That's a great idea. We'll have to see how that turns out. (They sponsored a RAND report on this topic that was recently released. That will have to be the topic of another post.)
We should track whether they establish their external safety advisory board -- and whether it has appropriate autonomy and software safety technical expertise as well as areas such as safety culture and human/machine interface.
They should also have an independent internal monitor making sure their safety-relevant operational and design processes are being followed. This seems in line with their plans.
They need a much stronger story about how they plan to ensure software safety and system safety when they remove their Mission Specialists from the vehicle downstream. Hopefully they'll make public a high level version of the safety case and have it externally evaluated.
I hope that they work with PennDOT to comply with PA AV testing safety guidelines before resuming operation in Pittsburgh, where I live. From the materials I've seen that should be straightforward, but they should still do it. As of right now, they'd only be the second company to do so.

Dr. Philip Koopman is a faculty member at Carnegie Mellon University. He is an internationally recognized expert in the area of self-driving car safety. He is also Co-Founder of Edge Case Research, which provides products and services relating to autonomy safety.
koopman@cmu.edu

Safe Autonomy

Monday, November 26, 2018

FiveAI Report on Autonomous Vehicle Safety Certification

Monday, November 19, 2018

Webinar on Robustness Testing of Perception

Wednesday, November 7, 2018

Potential Autonomous Vehicle Safety Improvement: Less Hype, More Data (OESA 2018)

Monday, November 5, 2018

Uber ATG Safety Report